Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lazy loading csv with pandas

Tags:

python

pandas

csv

I have a huge 22 GB csv file that contains a matrix 10000 x 10000 entries. But actually I only need a small portion of file for my purpose that can fit well within my 4 GB ram. Is there anyway to lazy load the CSV to my system so that I need to pick only some non contagious portion of the file say 25 different specific rows. I have heard of iterator in pandas that loads data piece by piece, but am still not sure of its memory requirements.

like image 774
Amrith Krishna Avatar asked Dec 31 '14 07:12

Amrith Krishna


Video Answer


1 Answers

For a small number of lines try using linecache and manually creating a pandas DataFrame.

For example, the following code puts lines 12, 24, and 36 (1-indexed) into a DataFrame.

import linecache
from pandas import DataFrame

filename = "large.csv"
indices = [12,24,36]

li = []
for i in indices:
    li.append(linecache.getline(filename, i).rstrip().split(','))

dataframe = DataFrame(li)
like image 108
Karmen Avatar answered Oct 18 '22 19:10

Karmen