I'd like to know if there's a method or a Python Package that can make me use a large dataset without writing it in RAM.
I'm also using pandas for statistical function.
I need to have access on the entire dataset because many statistical functions needs the entire dataset to return credible results.
I'm using PyDev (with interpreter Python 3.4) on LiClipse with Windows 10.
You could alternatively use Sframes, Dask for large dataset support or alternatively use pandas and read/iterate in chunks in order to minimise RAM usage. Also worth having a look at the blaze library
Read in chunks:
chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
process(chunk)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With