Large, persistent DataFrame in pandas

Tags:

I am exploring switching to python and pandas as a long-time SAS user.

However, when running some tests today, I was surprised that python ran out of memory when trying to pandas.read_csv() a 128mb csv file. It had about 200,000 rows and 200 columns of mostly numeric data.

With SAS, I can import a csv file into a SAS dataset and it can be as large as my hard drive.

Is there something analogous in pandas?

I regularly work with large files and do not have access to a distributed computing network.

821

asked Jul 24 '12 00:07

Zelazny7

1 Answers

Wes is of course right! I'm just chiming in to provide a little more complete example code. I had the same issue with a 129 Mb file, which was solved by:

import pandas as pd  tp = pd.read_csv('large_dataset.csv', iterator=True, chunksize=1000)  # gives TextFileReader, which is iterable with chunks of 1000 rows. df = pd.concat(tp, ignore_index=True)  # df is DataFrame. If errors, do `list(tp)` instead of `tp`

179

answered Nov 07 '22 15:11

fickludd

Related questions
                            
                                Get matplotlib color cycle state
                            
                                Count number of non-NaN entries in every column of Dataframe
                            
                                'pip install' fails for every package ("Could not find a version that satisfies the requirement") [duplicate]
                            
                                Function chaining in Python
                            
                                Inheritance of private and protected methods in Python
                            
                                ERROR: Could not build wheels for scipy which use PEP 517 and cannot be installed directly
                            
                                How do I autoformat some Python code to be correctly formatted?
                            
                                Pandas sum by groupby, but exclude certain columns
                            
                                Flask Value error view function did not return a response [duplicate]
                            
                                Slicing a list in Python without generating a copy
                            
                                get UTC timestamp in python with datetime
                            
                                Check if all elements of a list are of the same type
                            
                                python: how to check if a line is an empty line
                            
                                How to execute Python scripts in Windows?
                            
                                How to add multiple values to a dictionary key in python? [closed]
                            
                                Django import error - no module named django.conf.urls.defaults
                            
                                Python : Get size of string in bytes
                            
                                Python: printing a file to stdout
                            
                                Regular expression syntax for "match nothing"?
                            
                                Python - Passing a function into another function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Large, persistent DataFrame in pandas

Tags:

python

pandas

sas

Zelazny7

People also ask

1 Answers

fickludd

Recent Activity

Donate For Us