读取旧版 Parquet 文件
由于 Apache Sedona 1.4.0 对 GeometryUDT 的 SQL 类型
(SEDONA-205)以及几何值的序列化格式
(SEDONA-207)引入了破坏性变更,因此由 Apache Sedona 1.3.1 或更早版本写出的、包含几何列的 Parquet 文件无法被 Apache Sedona 1.4.0 或更高版本直接读取。
对于在 Apache Sedona 1.3.1-incubating 或更早版本下、使用 "parquet" 格式写出的 parquet 文件:
df.write.format("parquet").save("path/to/parquet/files")
如果用 Apache Sedona 1.4.0 或更高版本通过 spark.read.format("parquet").load("path/to/parquet/files") 来读取这些文件,将会抛出异常:
24/01/08 12:52:56 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID 11)
org.apache.spark.sql.AnalysisException: Invalid Spark read type: expected required group geom (LIST) {
repeated group list {
required int32 element (INTEGER(8,true));
}
} to be list type but found Some(BinaryType)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkConversionRequirement(ParquetSchemaConverter.scala:745)
at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertGroupField$3(ParquetSchemaConverter.scala:343)
at scala.Option.fold(Option.scala:251)
at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:324)
at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertField(ParquetSchemaConverter.scala:188)
at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertInternal$3(ParquetSchemaConverter.scala:147)
at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertInternal$3$adapted(ParquetSchemaConverter.scala:117)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.immutable.Range.foreach(Range.scala:158)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
...
自 v1.5.1 起,GeoParquet 支持读取旧版 Parquet 文件。你可以使用 "geoparquet" 格式,并加上 .option("legacyMode", "true") 选项。示例如下:
val df = sedona.read.format("geoparquet").option("legacyMode", "true").load("path/to/legacy-parquet-files")
Dataset<Row> df = sedona.read.format("geoparquet").option("legacyMode", "true").load("path/to/legacy-parquet-files")
df = sedona.read.format("geoparquet").option("legacyMode", "true").load("path/to/legacy-parquet-files")