How to write a large csv file to hdf5 in python?

Tags:

I have a dataset that is too large to directly read into memory. And I don't want to upgrade the machine. From my readings, HDF5 may be a suitable solution for my problem. But I am not sure how to iteratively write the dataframe into the HDF5 file since I can not load the csv file as a dataframe object.

So my question is how to write a large CSV file into HDF5 file with python pandas.

823

asked Oct 07 '17 13:10

Yan Song

1 Answers

You can read CSV file in chunks using chunksize parameter and append each chunk to the HDF file:

hdf_key = 'hdf_key'
df_cols_to_index = [...] # list of columns (labels) that should be indexed
store = pd.HDFStore(hdf_filename)

for chunk in pd.read_csv(csv_filename, chunksize=500000):
    # don't index data columns in each iteration - we'll do it later ...
    store.append(hdf_key, chunk, data_columns=df_cols_to_index, index=False)
    # index data columns in HDFStore

store.create_table_index(hdf_key, columns=df_cols_to_index, optlevel=9, kind='full')
store.close()

129

answered Nov 14 '22 23:11

MaxU - stop WAR against UA

Related questions
                            
                                For each row return the Column name of the smallest value - pandas [duplicate]
                            
                                Creating image tiles (m*n) of original image using Python and Numpy
                            
                                regex to get measurements
                            
                                Match columns and append to data frame, Python 3.6
                            
                                What's the point of "plt.figure"?
                            
                                How to remove x and y axis labels in a clustermap?
                            
                                Seaborn/Matplotlib: how to access line values in FacetGrid?
                            
                                Use numpy setdiff1d keeping the order
                            
                                Python download multiple files from links on pages
                            
                                Insert bytearray into bytearray Python
                            
                                How to make a list of integers that is the sum of all the integers from a set of lists in a dict?
                            
                                Python pandas.core.indexing.IndexingError: Unalignable boolean Series key provided
                            
                                Python - Exclude weekends between two Dates
                            
                                From Ruby to Python - Is there an equivalent of "try"?
                            
                                Groupby and subtract columns in pandas
                            
                                Print the first n numbers of the fibonacci sequence in one expression
                            
                                flask.ext.script is deprecated
                            
                                Base64 encoding in python3
                            
                                Scrapy Return Multiple Items
                            
                                How to control the source IP address of a ZeroMQ packet on a machine with multiple IPs?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to write a large csv file to hdf5 in python?

Tags:

python

pandas

hdf5

Yan Song

People also ask

1 Answers

MaxU - stop WAR against UA

Recent Activity

Donate For Us