I have a huge 22 GB csv file that contains a matrix 10000 x 10000 entries. But actually I only need a small portion of file for my purpose that can fit well within my 4 GB ram. Is there anyway to lazy load the CSV to my system so that I need to pick only some non contagious portion of the file say 25 different specific rows. I have heard of iterator
in pandas that loads data piece by piece, but am still not sure of its memory requirements.
For a small number of lines try using linecache and manually creating a pandas DataFrame.
For example, the following code puts lines 12, 24, and 36 (1-indexed) into a DataFrame.
import linecache
from pandas import DataFrame
filename = "large.csv"
indices = [12,24,36]
li = []
for i in indices:
li.append(linecache.getline(filename, i).rstrip().split(','))
dataframe = DataFrame(li)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With