How to read two larger 5GB csv files in local system Jupyter Notebook using python pandas? how to join two dataframes for data analysis in local?

Question

How to upload two large (5GB) each csv file in local system Jupyter Notebook using python pandas. Please suggest any configuration to handle big csv files for data analysis ?

Local System Configuration:
OS: Windows 10
RAM: 16 GB
Processor: Intel-Core-i7

Code:

dpath = 'p_flg_tmp1.csv'
pdf = pd.read_csv(dpath, sep="|") 

Error:
MemoryError: Unable to allocate array

or

pd.read_csv(po_cust_data, sep="|", low_memory=False)

Error:
ParserError: Error tokenizing data. C error: out of memory

How to handle two bigger csv file in local system for data analysis? please suggested better configuration if possible in local system using python pandas.

Mailerdaimon · Accepted Answer

If you do not need to process everything at once you can use chunks:

reader = pd.read_csv('tmp.sv', sep='|', chunksize=4000)   
for chunk in reader:
     print(chunk)

see the Documentation of Pandas for further information.

If you need to process everything at once and chunking really isnt an option you have only two options left

Increase RAM of your system
Switch to another data storage type

A csv file takes an enormous amount of memory in RAM, see this article for more information even if it is for another software it gives a good idea about the problem:

Memory Usage

You can estimate the memory usage of your CSV file with this simple formula:
memory = 25 * R * C + F 
where R is the number of rows, C the number of columns and F the file size in bytes.

One of my test files is 524 MB large, contains 10 columns in 4.4 million rows. Using the formula from above the RAM usage will be about 1.6 GB:
memory = 25 * 4,400,000 * 10 + 524,000,000 = 1,624,000,000 bytes
While this file is opened in Tablecruncher the Activity Monitor reports 1.4 GB RAM used, so the formula represents a rather accurate guess.

How to read two larger 5GB csv files in local system Jupyter Notebook using python pandas? how to join two dataframes for data analysis in local?

Tags:

python

python-3.x

pandas

jupyter-notebook

data-science

Srinivas K

1 Answers

Mailerdaimon

Recent Activity

Donate For Us

How to read two larger 5GB csv files in local system Jupyter Notebook using python pandas? how to join two dataframes for data analysis in local?

Tags:

python

python-3.x

pandas

jupyter-notebook

data-science

Srinivas K

1 Answers

Mailerdaimon

Related questions

Recent Activity

Donate For Us