Why do I get "partition values: [empty row]" log messages when reading a file?

Question

I am using Spark SQL to read in a csv, I also get a lot of such messages:

...some.csv, range: 20971520-24311915, partition values: [empty row]

Why does it say it's empty row? Is the partition real empty?

Piotr Góralczyk · Accepted Answer

Neither the file nor the Spark partition with data read from the file is empty.

The log message may be a bit confusing because of two things:

The word partition in the message refers to a Hive-style partition, i.e. a named partition column that can have multiple values. Such partitions can be inferred from your directory structure, e.g. for /path/to/partition/a=1/b=hello/c=3.14 they would be a, b and c, and their values: 1, hello and 3.14. They can also come from the Hive Metastore in case of partitioned external tables.
The partition values logged are wrapped in an InternalRow, not in a collection.

In your case, the directory structure is flat or it does not contain partition names (e.g. /path/to/partition/1/hello/3.14), so there are no Hive-style partitions and you see [empty row] in the message as a result.

Why do I get "partition values: [empty row]" log messages when reading a file?

Tags:

apache-spark

apache-spark-sql

zyxue

1 Answers

Piotr Góralczyk

Recent Activity

Donate For Us

Why do I get "partition values: [empty row]" log messages when reading a file?

Tags:

apache-spark

apache-spark-sql

zyxue

1 Answers

Piotr Góralczyk

Related questions

Recent Activity

Donate For Us