After some searching I failed to find a thorough comparison of <code>fastparquet</code> and <code>pyarrow</code>. I found this blog post (a basic comparison of speeds). and a github discussion that claims that files created with <code>fastparquet</code> do not support AWS-athena (btw is it still the case?) when/why would I use one over the other? what are the major advantages and disadvantages ? <hr> my specific use case is processing data with <code>dask</code> writing it to s3 and then reading/analyzing it with AWS-athena.

However, since the question lacks concrete criteria, and I came here for a good "default choice", I want to state that pandas default engine for DataFrame objects is pyarrow (see pandas docs).

A comparison between fastparquet and pyarrow?

2 Answers

I used both fastparquet and pyarrow for converting protobuf data to parquet and to query the same in S3 using Athena. Both worked, however, in my use-case, which is a lambda function, package zip file has to be lightweight, so went ahead with fastparquet. (fastparquet library was only about 1.1mb, while pyarrow library was 176mb, and Lambda package limit is 250mb).

I used the following to store a dataframe as parquet file:

from fastparquet import write  parquet_file = path.join(filename + '.parq') write(parquet_file, df_data)

168

answered Sep 16 '22 15:09

Daenerys

However, since the question lacks concrete criteria, and I came here for a good "default choice", I want to state that pandas default engine for DataFrame objects is pyarrow (see pandas docs).

answered Sep 19 '22 15:09

d4tm4x

Related questions
                            
                                Elegant Python code for Integer Partitioning [closed]
                            
                                Speeding Up Python
                            
                                How to find most common elements of a list? [duplicate]
                            
                                Fast Haversine Approximation (Python/Pandas)
                            
                                How to return more than one value from a function in Python? [duplicate]
                            
                                Convert list of ASCII codes to string (byte array) in Python
                            
                                Python function to convert seconds into minutes, hours, and days
                            
                                How to make Django slugify work properly with Unicode strings?
                            
                                Python: Generate random number between x and y which is a multiple of 5 [duplicate]
                            
                                Compact way to assign values by slicing list in Python
                            
                                Suppress output in Python calls to executables
                            
                                numpy: Reliable (non-conservative) indicator if numpy array is view
                            
                                How can I create a ramdisk in Python?
                            
                                File size differences after copying a file to a server vía FTP
                            
                                Is there any nosql flat file database just as sqlite? [closed]
                            
                                Do overridden methods inherit decorators in python?
                            
                                Genetic Algorithms and multi-objectives optimization on PYTHON : libraries/tools to use? [closed]
                            
                                Django Selective Dumpdata
                            
                                Please explain "Task was destroyed but it is pending!"
                            
                                How can I find the full path to a font from its display name on a Mac?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

A comparison between fastparquet and pyarrow?

Tags:

python

parquet

dask

pyarrow

fastparquet

moshevi

People also ask

2 Answers

Daenerys

d4tm4x

Recent Activity

Donate For Us