read specific columns from hdf5 file and pass conditions

Tags:

I want to read only specific columns from HDF5 file and pass conditions on those columns. My concern is that I dont want to fetch all HDF5 file as dataframe in the memory. I want to get only my necessary columns with their conditions.

columns=['col1', 'col2']
condition= "col2==1"
groupname='\path\to\group'
Hdf5File=os.path.join('path\to\hdf5.h5')
with pd.HDFStore(Hdf5File, mode='r', format='table') as store:
     if groupname in store:
        df=pd.read_hdf(store, key=groupname, columns=columns, where=["col2==1"])

I get an error :

TypeError: cannot pass a column specification when reading a Fixed format store. this store must be selected in its entirety

Then I use below line which returns only specific columns:

df=store[groupname][columns]

But I dont know how can I pass condition on it.

243

asked Jul 03 '17 09:07

Safariba

1 Answers

In order to be able to read HDF5 files conditionally, they must be saved in the table format and the corresponding columns must be indexed.

Demo:

df = pd.DataFrame(np.random.rand(100,5), columns=list('abcde'))
df.to_hdf('c:/temp/file.h5', 'df_key', format='t', data_columns=True)

In [10]: pd.read_hdf('c:/temp/file.h5', 'df_key', where="a > 0.5 and a < 0.75")
Out[10]:
           a         b         c         d         e
3   0.744123  0.515697  0.005335  0.017147  0.176254
5   0.555202  0.074128  0.874943  0.660555  0.776340
6   0.667145  0.278355  0.661728  0.705750  0.623682
8   0.701163  0.429860  0.223079  0.735633  0.476182
14  0.645130  0.302878  0.428298  0.969632  0.983690
15  0.633334  0.898632  0.881866  0.228983  0.216519
16  0.535633  0.906661  0.221823  0.608291  0.330101
17  0.715708  0.478515  0.002676  0.231314  0.075967
18  0.587762  0.262281  0.458854  0.811845  0.921100
21  0.551251  0.537855  0.906546  0.169346  0.063612
..       ...       ...       ...       ...       ...
68  0.610958  0.874373  0.785681  0.147954  0.966443
72  0.619666  0.818202  0.378740  0.416452  0.903129
73  0.500782  0.536064  0.697678  0.654602  0.054445
74  0.638659  0.518900  0.210444  0.308874  0.604929
76  0.696883  0.601130  0.402640  0.150834  0.264218
77  0.692149  0.963457  0.364050  0.152215  0.622544
85  0.737854  0.055863  0.346940  0.003907  0.678405
91  0.644924  0.840488  0.151190  0.566749  0.181861
93  0.710590  0.900474  0.061603  0.144200  0.946062
95  0.601144  0.288909  0.074561  0.615098  0.737097

[33 rows x 5 columns]

UPDATE:

If you can't change the HDF5 file, then consider the following technique:

In [13]: df = pd.concat([x.query("0.5 < a < 0.75")
                         for x in pd.read_hdf('c:/temp/file.h5', 'df_key', chunksize=10)],
                        ignore_index=True)

In [14]: df
Out[14]:
           a         b         c         d         e
0   0.744123  0.515697  0.005335  0.017147  0.176254
1   0.555202  0.074128  0.874943  0.660555  0.776340
2   0.667145  0.278355  0.661728  0.705750  0.623682
3   0.701163  0.429860  0.223079  0.735633  0.476182
4   0.645130  0.302878  0.428298  0.969632  0.983690
5   0.633334  0.898632  0.881866  0.228983  0.216519
6   0.535633  0.906661  0.221823  0.608291  0.330101
7   0.715708  0.478515  0.002676  0.231314  0.075967
8   0.587762  0.262281  0.458854  0.811845  0.921100
9   0.551251  0.537855  0.906546  0.169346  0.063612
..       ...       ...       ...       ...       ...
23  0.610958  0.874373  0.785681  0.147954  0.966443
24  0.619666  0.818202  0.378740  0.416452  0.903129
25  0.500782  0.536064  0.697678  0.654602  0.054445
26  0.638659  0.518900  0.210444  0.308874  0.604929
27  0.696883  0.601130  0.402640  0.150834  0.264218
28  0.692149  0.963457  0.364050  0.152215  0.622544
29  0.737854  0.055863  0.346940  0.003907  0.678405
30  0.644924  0.840488  0.151190  0.566749  0.181861
31  0.710590  0.900474  0.061603  0.144200  0.946062
32  0.601144  0.288909  0.074561  0.615098  0.737097

[33 rows x 5 columns]

answered Nov 11 '22 05:11

MaxU - stop WAR against UA

Related questions
                            
                                why am i getting error when importing AudioSegment?
                            
                                Linear regression with tensorflow
                            
                                Transform datetime in YYYY-MM-DD HH:MM[:SS[.SSSSSS]]
                            
                                Plot pandas DataFrame against month
                            
                                How to use np.save to save files in different directory in python?
                            
                                Pandas dataframe columns of lists to numpy arrays for each column
                            
                                Making a table in Python 3(beginner)
                            
                                Type annotation style (to space or not to space)
                            
                                Retrieve Decision Boundary Lines (x,y coordinate format) from SKlearn Decision Tree
                            
                                Finding the position of words in a string [duplicate]
                            
                                Mocking a return value which is an object
                            
                                pythonVSCode, venv and pylint
                            
                                Python Embeddable Zip File Doesn't Include lib/site-packages in sys.path
                            
                                Is there a way to check for linearly dependent columns in a dataframe?
                            
                                Reshape rows to columns in pandas dataframe
                            
                                How to handle, when tkinter window gets focus
                            
                                Check if a string contains the list elements
                            
                                Python Using Multiprocessing
                            
                                Django Error Logging: Adding request header, body and user information
                            
                                Converting a numpy.ndarray to string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

read specific columns from hdf5 file and pass conditions

Tags:

python

pandas

hdf5

Safariba

People also ask

1 Answers

MaxU - stop WAR against UA

Recent Activity

Donate For Us