why Iceberg rewriteDataFiles doesn't rewrite the files to one file?

Question

I have an iceberg table with 2 parquets files store 4 rows in s3 I tried the following command:

val tables = new HadoopTables(conf);
val table = tables.load("s3://iceberg-tests-storage/data/db/test5");    
SparkActions.get(spark).rewriteDataFiles(table).option("target-file-size-bytes", "52428800").execute();

but nothing changed. what I'm doing wrong?

Kyle Bendickson · Accepted Answer

A few notes:

Iceberg by default won't compact files unless a minimum number of small files are available to compact per file group and per partition. The default is 5.
- This can be configured via min-input-files as an option.
Iceberg won't compact files across partitions, as one file must map 1:1 to a tuple of partition values.
- As an example: for a table partitioned by col1 and col2, files with col1=A and col2=1 cannot be compacted with files with col1=A and col2=4

In your case, if you set min-input-files to 2, provided the files are part of the same partition or the table is not partitioned, the files should be compacted together.

why Iceberg rewriteDataFiles doesn't rewrite the files to one file?

Tags:

apache-spark

apache-iceberg

eweiss

1 Answers

Kyle Bendickson

Recent Activity

Donate For Us

why Iceberg rewriteDataFiles doesn't rewrite the files to one file?

Tags:

apache-spark

apache-iceberg

eweiss

1 Answers

Kyle Bendickson

Related questions

Recent Activity

Donate For Us