I'm having trouble finding a library that allows Parquet files to be written using Python. Bonus points if I can use Snappy or a similar compression mechanism in conjunction with it. Thus far the only method I have found is using Spark with the <code>pyspark.sql.DataFrame</code> Parquet support. I have some scripts that need to write Parquet files that are not Spark jobs. Is there any approach to writing Parquet files in Python that doesn't involve <code>pyspark.sql</code>?

Update (March 2017): There are currently 2 libraries capable of writing Parquet files: <ol> <li>fastparquet</li> <li>pyarrow</li> </ol> Both of them are still under heavy development it seems and they come with a number of disclaimers (no support for nested data e.g.), so you will have to check whether they support everything you need. OLD ANSWER: As of 2.2016 there seems to be NO python-only library capable of writing Parquet files. If you only need to read Parquet files there is python-parquet. As a workaround you will have to rely on some other process like e.g. <code>pyspark.sql</code> (which uses Py4J and runs on the JVM and can thus not be used directly from your average CPython program).

fastparquet does have write support, here is a snippet to write data to a file <pre class="prettyprint"><code>from fastparquet import write write('outfile.parq', df) </code></pre>

Methods for writing Parquet files using Python?

2 Answers

Update (March 2017): There are currently 2 libraries capable of writing Parquet files:

fastparquet
pyarrow

Both of them are still under heavy development it seems and they come with a number of disclaimers (no support for nested data e.g.), so you will have to check whether they support everything you need.

OLD ANSWER:

As of 2.2016 there seems to be NO python-only library capable of writing Parquet files.

If you only need to read Parquet files there is python-parquet.

As a workaround you will have to rely on some other process like e.g. pyspark.sql (which uses Py4J and runs on the JVM and can thus not be used directly from your average CPython program).

185

answered Sep 21 '22 19:09

rkrzr

fastparquet does have write support, here is a snippet to write data to a file

from fastparquet import write
write('outfile.parq', df)

answered Sep 22 '22 19:09

Muayyad Alsadi

Related questions
                            
                                Convert words between verb/noun/adjective forms
                            
                                Using utf-8 characters in a Jinja2 template
                            
                                How to use the Python getpass.getpass in PyCharm
                            
                                Changing image hue with Python PIL
                            
                                Does Python go well with QML (Qt-Quick)?
                            
                                Pythonic way to iterate through a range starting at 1
                            
                                Python ConfigParser.NoSectionError: No section:
                            
                                What does the --pre option in pip signify?
                            
                                What are the differences between setUpClass, setUpTestData and setUp in TestCase class?
                            
                                Setting SECURE_HSTS_SECONDS can irreversibly break your site?
                            
                                how to get tz_info object corresponding to current timezone?
                            
                                Is there any adequate scaffolding for Django? (à la Ruby on Rails)
                            
                                Using Django Managers vs. staticmethod on Model class directly
                            
                                Preventing django from appending "_id" to a foreign key field
                            
                                How to break out of while loop in Python?
                            
                                How can I send variables to Jinja template from a Flask decorator?
                            
                                raise with no argument
                            
                                How to correctly parse UTF-8 encoded HTML to Unicode strings with BeautifulSoup? [duplicate]
                            
                                How can I execute Python scripts using Anaconda's version of Python?
                            
                                Clean code for sequence of map/filter/reduce functions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Methods for writing Parquet files using Python?

Tags:

python

apache-spark

apache-spark-sql

parquet

snappy

octagonC

People also ask

2 Answers

rkrzr

Muayyad Alsadi

Recent Activity

Donate For Us