Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load large data into pandas efficiently? [duplicate]

I am working with a very wide dataset (1005 rows * 590,718 columns, 1.2G). Loading such a large dataset into a pandas dataframe result in code failure entirely due to insufficient memory.

I am aware that Spark is probably a good alternative to Pandas for dealing with large datasets, but is there any amenable solution in Pandas to reduce memory usage while loading large data?

like image 335
RJF Avatar asked Oct 16 '25 03:10

RJF


1 Answers

You could use

pandas.read_csv(filename, chunksize = chunksize)
like image 56
grshankar Avatar answered Oct 17 '25 16:10

grshankar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!