Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why Iceberg rewriteDataFiles doesn't rewrite the files to one file?

I have an iceberg table with 2 parquets files store 4 rows in s3 I tried the following command:

val tables = new HadoopTables(conf);
val table = tables.load("s3://iceberg-tests-storage/data/db/test5");    
SparkActions.get(spark).rewriteDataFiles(table).option("target-file-size-bytes", "52428800").execute();

but nothing changed. what I'm doing wrong?

like image 676
eweiss Avatar asked Dec 06 '25 17:12

eweiss


1 Answers

A few notes:

  1. Iceberg by default won't compact files unless a minimum number of small files are available to compact per file group and per partition. The default is 5.
    • This can be configured via min-input-files as an option.
  2. Iceberg won't compact files across partitions, as one file must map 1:1 to a tuple of partition values.
    • As an example: for a table partitioned by col1 and col2, files with col1=A and col2=1 cannot be compacted with files with col1=A and col2=4

In your case, if you set min-input-files to 2, provided the files are part of the same partition or the table is not partitioned, the files should be compacted together.

like image 192
Kyle Bendickson Avatar answered Dec 08 '25 19:12

Kyle Bendickson



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!