Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the compression types supported in parquet

I was writing data on Hadoop and hive in parquet format using spark. I want to enable compression but i can only find 2 types on compression - snappy and Gzip being used most of the times. Does parquet support any other compression like Deflate and lzo also?

like image 436
User_qwerty Avatar asked Jul 06 '18 05:07

User_qwerty


2 Answers

The supported compression types for Apache Parquet are specified in the parquet-format repository:

/**
 * Supported compression algorithms.
 *
 * Codecs added in 2.4 can be read by readers based on 2.4 and later.
 * Codec support may vary between readers based on the format version and
 * libraries available at runtime. Gzip, Snappy, and LZ4 codecs are
 * widely available, while Zstd and Brotli require additional libraries.
 */
enum CompressionCodec {
  UNCOMPRESSED = 0;
  SNAPPY = 1;
  GZIP = 2;
  LZO = 3;
  BROTLI = 4; // Added in 2.4
  LZ4 = 5;    // Added in 2.4
  ZSTD = 6;   // Added in 2.4
}

https://github.com/apache/parquet-format/blob/54e6133e887a6ea90501ddd72fff5312b7038a7c/src/main/thrift/parquet.thrift#L461

Snappy and Gzip are the most commonly used ones and are supported by all implementations. LZ4 and ZSTD yield better results the former two but are a rather new addition to the format, so they are only supported in the newer versions of some of the implementations.

like image 181
Uwe L. Korn Avatar answered Oct 20 '22 22:10

Uwe L. Korn


In Spark 2.1

From the Spark source code, branch 2.1:

You can set the following Parquet-specific option(s) for writing Parquet files:

compression (default is the value specified in spark.sql.parquet.compression.codec): compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, snappy, gzip, and lzo).
This will overridespark.sql.parquet.compression.codec
...

In Spark 2.4 / 3.0

overall supported compresssions are: none, uncompressed, snappy, gzip, lzo, brotli, lz4, and zstd

like image 41
Samson Scharfrichter Avatar answered Oct 20 '22 22:10

Samson Scharfrichter