Compression performance related to chunk size in hdf5 files

Tags:

I would like to ask a question about the performance of compression which is related to chunk size of hdf5 files.

I have 2 hdf5 files on hand, which have the following properties. They both only contain one dataset, called "data".

File A's "data":

Type: HDF5 Scalar Dataset
No. of Dimensions: 2
Dimension Size: 5094125 x 6
Max. dimension size: Unlimited x Unlimited
Data type: 64-bit floating point
Chunking: 10000 x 6
Compression: GZIP level = 7

File B's "data":

Type: HDF5 Scalar Dataset
No. of Dimensions: 2
Dimension Size: 6720 x 1000
Max. dimension size: Unlimited x Unlimited
Data type: 64-bit floating point
Chunking: 6000 x 1
Compression: GZIP level = 7

File A's size: HDF5----19 MB CSV-----165 MB

File B's size: HDF5----60 MB CSV-----165 MB

Both of them shows great compression on data stored when comparing to csv files. However, the compression rate of file A is about 10% of original csv, while that of file B is only about 30% of original csv.

I have tried different chunk size to make file B as small as possible, but it seems that 30% is the optimum compression rate. I would like to ask why file A can achieve a greater compression while file B cannot.

If file B can also achieve, what should the chunk size be?

Is that any rule to determine the optimum chunk size of HDF5 for compression purpose?

Thanks!

437

asked May 28 '13 07:05

C.T.

1 Answers

Chunking doesn't really affect the compression ratio per se, except in the manner @Ümit describes. What chunking does do is affect the I/O performance. When compression is applied to an HDF5 dataset, it is applied to whole chunks, individually. This means that when reading data from a single chunk in a dataset, the entire chunk must be decompressed - possibly involving a whole lot more I/O, depending on the size of the cache, shape of the chunk, etc.

What you should do is make sure that the chunk shape matches how you read/write your data. If you generally read a column at a time, make your chunks columns, for example. This is a good tutorial on chunking.

190

answered Sep 30 '22 10:09

Yossarian

Related questions
                            
                                Compression to Elasticsearch indexes
                            
                                How to compress HTTP requests from WCF .NET at the transport level?
                            
                                boost::iostream zlib compressing multiple files into one archive
                            
                                HttpCompression element in web.config not read in IIS7.5
                            
                                vba, how to zip file with special characters?
                            
                                How to specify bitrate for JPEG compression?
                            
                                Javascript string compression for URL hash parameter
                            
                                Is it possible to resume 7zip compression?
                            
                                LZ4 compression algorithm explanation
                            
                                Converting File object to Image object in JavaScript
                            
                                How should I extract compressed folders in java?
                            
                                How to Compress video file in android
                            
                                How to open and read LZMA file in-memory
                            
                                LzmaLib: compress / decompress buffer in C
                            
                                Zip on-the-fly compression library in C for streaming
                            
                                Compress a file with RAR
                            
                                Is there a way to store gzip's dictionary from a file?
                            
                                How can I create multipart compressed zip file in java
                            
                                Downloading and extracting .gz data file using R
                            
                                Ways to compress/minify javascript files [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Compression performance related to chunk size in hdf5 files

Tags:

hdf5

compression

chunking

C.T.

People also ask

1 Answers

Yossarian

Recent Activity

Donate For Us