Case:
part-00000-deb4a3d4-d8c3-4983-8756-ad7e0b29e780.c000.snappy.parquet
I can't find some rules of a parquet file in the code. could someone explain?
code: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala
In this case:
part-00000 signifies Split (of a) Partition number.
-deb4a3d4-d8c3-4983-8756-ad7e0b29e780 signifies random UUID to allow concurrent write processes in Spark Actions that do not conflict.
"c000" signifies a counter indicating the number of times a file has been written for this partition. here is it is zero and it counts on. Not sure what happens if 999 exceeded, to be honest.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With