pandas - How to save only selected columns of a DataFrame to HDF5

Tags:

I'm reading a csv sample file and store it on .h5 database. The .csv is structured as follows:

User_ID;Longitude;Latitude;Year;Month;String
267261661;-3.86580025;40.32170825;2013;12;hello world
171255468;-3.83879575;40.05035005;2013;12;hello world
343588169;-3.70759531;40.4055946;2014;2;hello world
908779052;-3.8356385;40.1249459;2013;8;hello world
289540518;-3.6723114;40.3801642;2013;11;hello world
635876313;-3.8323166;40.3379393;2012;10;hello world
175160914;-3.53687933;40.35101274;2013;12;hello world 
155029860;-3.68555076;40.47688417;2013;11;hello world

I've putting it on a .h5 store with the pandas to_hdf, selecting to pass to the .h5 only a couple of columns:

import pandas as pd

df = pd.read_csv(filename + '.csv', sep=';')

df.to_hdf('test.h5','key1',format='table',data_columns=['User_ID','Year'])

I've obtained different results in the columns stored in the .h5 file using HDFStore and read_hdf, in particular:

store = pd.HDFStore('test.h5')
>>> store
>>> <class 'pandas.io.pytables.HDFStore'>
File path: /test.h5
/key1            frame_table  (typ->appendable,nrows->8,ncols->6,indexers->[index],dc->[User_ID,Year])

which is what I expect (only the 'User_ID' and 'Year' columns stored in the database), althought the ncols->6 means that actually all the columns have been stored in the .h5 file.

If I try reading the file with pd.read_hdf:

hdf = pd.read_hdf('test.h5','key1')

and asking for the keys:

hdf.keys()
>>> Index([u'User_ID', u'Longitude', u'Latitude', u'Year', u'Month', u'String'], dtype='object')

which is not what I'm expected since all columns of the original .csv file are still in the .h5 database. How can I store only a selection of columns in the .h5 in order to reduce the size of the database?

Thanks for your help.

671

asked Jan 10 '15 17:01

Fabio Lamanna

1 Answers

just select out the columns as you write to the file.

cols_to_keep = ['User_ID', 'Year']
df.loc[:, cols_to_keep].to_hdf(...)

answered Nov 14 '22 21:11

Paul H

Related questions
                            
                                How do I merge two CSV files based on field and keep same number of attributes on each record?
                            
                                Python Flask downloading a file returns 0 bytes
                            
                                Install python packages on shared host
                            
                                Python: select most frequent using group by [duplicate]
                            
                                Cannot install Matplotlib on Mac
                            
                                Django: Change TimeField to DateTimeField in models.py
                            
                                Can flask-peewee do migration?
                            
                                "SyntaxError: Non-ASCII character" in running Python code
                            
                                List of edges that don't exist in a networkx graph?
                            
                                How can I convert Sqlalchemy table object to Pandas DataFrame?
                            
                                #include <zbar.h> 1 error generated when running pip install zbar
                            
                                Python create datetime object from list of values
                            
                                Binary numbers of N digits
                            
                                Finding the amount of characters of all words in a list in Python
                            
                                How to install PyMongo
                            
                                Python-Instagram API example not working
                            
                                Python regex to extract version from a string
                            
                                Changing colours of pixels of plt.imshow() image
                            
                                ctime, atime, and mtime - How to interpret them?
                            
                                Ubuntu - trying to install Python Couchbase lib - "libcouchbase/couchbase.h: No such file or directory"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas - How to save only selected columns of a DataFrame to HDF5

Tags:

python

pandas

hdf5

hdfstore

Fabio Lamanna

People also ask

1 Answers

Paul H

Recent Activity

Donate For Us