Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a huge .csv file in Jupyter Notebook

I'm trying to read data from a .csv file in Jupyter Notebook (Python)

.csv file is 8.5G, 70 million rows, and 30 columns

When I try to read .csv, I get errors.

Below are my codes

import pandas as pd

log = pd.read_csv('log_20100424.csv', engine = 'python')

I also tried using pyarrow, but it doesn't work.

import pandas as pd
from pyarrow import csv

log = csv.read('log_20100424.csv').to_pandas()

My question is :

How to read a huge(8.5G) .csv file in Jupyter Notebook

Is there any other way to read a huge .csv file?

My Laptop has 8 GB RAM, 64bit Windows 10, and i5-8265U 1.6Ghz.

like image 831
jwowowo Avatar asked Sep 03 '25 06:09

jwowowo


1 Answers

Even if Pandas can handle huge data, Jupyter Notebook cannot. To read a huge CSV file, you need to work in chunks. I faced similar situation where the Jupyter Notebook kernel would die and I had to start over again. Try this - Pandas Error Jupyter Notebook

like image 172
Varun Nagrare Avatar answered Sep 04 '25 19:09

Varun Nagrare