Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read two larger 5GB csv files in local system Jupyter Notebook using python pandas? how to join two dataframes for data analysis in local?

How to upload two large (5GB) each csv file in local system Jupyter Notebook using python pandas. Please suggest any configuration to handle big csv files for data analysis ?

Local System Configuration:
OS: Windows 10
RAM: 16 GB
Processor: Intel-Core-i7

Code:

dpath = 'p_flg_tmp1.csv'
pdf = pd.read_csv(dpath, sep="|") 

Error:
MemoryError: Unable to allocate array

or

pd.read_csv(po_cust_data, sep="|", low_memory=False)

Error:
ParserError: Error tokenizing data. C error: out of memory

How to handle two bigger csv file in local system for data analysis? please suggested better configuration if possible in local system using python pandas.

like image 440
Srinivas K Avatar asked Mar 03 '23 21:03

Srinivas K


1 Answers

If you do not need to process everything at once you can use chunks:

reader = pd.read_csv('tmp.sv', sep='|', chunksize=4000)   
for chunk in reader:
     print(chunk)

see the Documentation of Pandas for further information.

If you need to process everything at once and chunking really isnt an option you have only two options left

  1. Increase RAM of your system
  2. Switch to another data storage type

A csv file takes an enormous amount of memory in RAM, see this article for more information even if it is for another software it gives a good idea about the problem:

Memory Usage

You can estimate the memory usage of your CSV file with this simple formula:

memory = 25 * R * C + F 

where R is the number of rows, C the number of columns and F the file size in bytes.

One of my test files is 524 MB large, contains 10 columns in 4.4 million rows. Using the formula from above the RAM usage will be about 1.6 GB:

memory = 25 * 4,400,000 * 10 + 524,000,000 = 1,624,000,000 bytes

While this file is opened in Tablecruncher the Activity Monitor reports 1.4 GB RAM used, so the formula represents a rather accurate guess.

like image 147
Mailerdaimon Avatar answered Mar 05 '23 15:03

Mailerdaimon