How to limit the size of pandas queries on HDF5 so it doesn't go over RAM limit?

Question

Let's say I have a pandas Dataframe

import pandas as pd

df = pd.DataFrame()

df

   Column1    Column2
0  0.189086 -0.093137
1  0.621479  1.551653
2  1.631438 -1.635403
3  0.473935  1.941249
4  1.904851 -0.195161
5  0.236945 -0.288274
6 -0.473348  0.403882
7  0.953940  1.718043
8 -0.289416  0.790983
9 -0.884789 -1.584088
........

An example of a query is df.query('Column1 > Column2')

Let's say you wanted to limit the save of this query, so the object wasn't so large. Is there "pandas" way to accomplish this?

My question is primarily for querying at HDF5 object with pandas. An HDF5 object could be far larger than RAM, and therefore queries could be larger than RAM.

# file1.h5 contains only one field_table/key/HDF5 group called 'df'
store = pd.HDFStore('file1.h5')

# the following query could be too large 
df = store.select('df',columns=['column1', 'column2'], where=['column1==5'])

Is there a pandas/Pythonic way to stop users for executing queries that surpass a certain size?

MaxU - stop WAR against UA · Accepted Answer

Here is a small demonstration of how to use the chunksize parameter when calling HDFStore.select():

for chunk in store.select('df', columns=['column1', 'column2'],
                          where='column1==5', chunksize=10**6):
    # process `chunk` DF

How to limit the size of pandas queries on HDF5 so it doesn't go over RAM limit?

Tags:

python

pandas

dataframe

hdf5

pytables

ShanZhengYang

1 Answers

MaxU - stop WAR against UA

Recent Activity

Donate For Us

How to limit the size of pandas queries on HDF5 so it doesn't go over RAM limit?

Tags:

python

pandas

dataframe

hdf5

pytables

ShanZhengYang

1 Answers

MaxU - stop WAR against UA

Related questions

Recent Activity

Donate For Us