I use Apache Spark 2.1.1 (used 2.1.0 and it was the same, switched today). I have a dataset:
root
|-- muons: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- reco::Candidate: struct (nullable = true)
| | |-- qx3_: integer (nullable = true)
| | |-- pt_: float (nullable = true)
| | |-- eta_: float (nullable = true)
| | |-- phi_: float (nullable = true)
| | |-- mass_: float (nullable = true)
| | |-- vertex_: struct (nullable = true)
| | | |-- fCoordinates: struct (nullable = true)
| | | | |-- fX: float (nullable = true)
| | | | |-- fY: float (nullable = true)
| | | | |-- fZ: float (nullable = true)
| | |-- pdgId_: integer (nullable = true)
| | |-- status_: integer (nullable = true)
| | |-- cachePolarFixed_: struct (nullable = true)
| | |-- cacheCartesianFixed_: struct (nullable = true)
As you can see, there are 3 empty structs in this schema. I know 100% that I can read/manipulate/do whatever. However, when I try writing to disk in parquet, I get the following Exception:
dsReduced.write.format("parquet").save(outputPathName):
java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended behavior??? I also assume that it's related to the empty structs. Any help would be really appreciated!
Update: I've quickly created stripped version and that one works without any issues! Any insight would be really helpful!
VK
Parquet does not write empty structs:
for more info - see here https://issues.apache.org/jira/browse/SPARK-20593
VK
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With