Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

If the feather file format still relevant or is the community leaning towards other file formats for large file storage?

I'm exploring file storage format options for Python and stumbled on feather. I noticed the last release was back in 2017 and was concerned about its long term existence.

Web searches are pulling back posts that all seem to stop around 2017.

like image 791
cauthon Avatar asked Nov 05 '19 21:11

cauthon


People also ask

What is a feather file format?

Feather is a portable file format for storing Arrow tables or data frames (from languages like Python or R) that utilizes the Arrow IPC format internally. Feather was created early in the Arrow project as a proof of concept for fast, language-agnostic data frame storage for Python (pandas) and R.

Is Feather better than CSV?

Feather data format is a lightweight as well as very fast binary format for storing data frames. Feather file, takes less than half of the space, than the corresponding CSV file, having same data. Feather files are 100 times faster, while reading and writing from the disk, as compared to CSV files.

What type of file format is more efficient in terms of storage space?

Feather file format Feather is optimized for low storage space and high performance. This makes it a little less accessible than CSVs. While CSVs could work on any machine that could understand text, Feather works only with Python and R. Also, it doesn't come pre-installed.

Is Parquet better than CSV?

In a nutshell, Parquet is a more efficient data format for bigger files. You will save both time and money by using Parquet over CSVs.


1 Answers

The feather format is still relevant and support for more data types, especially on the R side has improved a lot recently. A remarkable change is that it is no longer released as a separate package but comes as part of arrow / https://arrow.apache.org/. There it is actively developed.

The other alternative format that the community is leading towards is Apache Parquet. There are some differences between feather and Parquet so that you may choose one over the other, e.g. Feather writes the data as-is and Parquet encodes and compresses it to achieve much smaller files. Additionally Parquet is also available in the Java world which might come in handy. Feather and Parquet are both available in R in the arrow library and in Python as part of pyarrow.

like image 190
Uwe L. Korn Avatar answered Nov 10 '22 14:11

Uwe L. Korn