I have been trying to read a few large text files (sizes around 1.4GB - 2GB) with Pandas, using the read_csv
function, with no avail. Below are the versions I am using:
I tried the following:
df = pd.read_csv(data.txt')
and it crashed Ipython with a message: Kernel died, restarting
.
Then I tried using an iterator:
tp = pd.read_csv('data.txt', iterator = True, chunksize=1000)
again, I got the Kernel died, restarting
error.
Any ideas? Or any other way to read big text files?
Thank you!
A solution for a similar question was given here some time after the posting of this question. Basically, it suggests to read the file in chunks
by doing the following:
chunksize = 10 ** 6 # number of rows per chunk
for chunk in pd.read_csv(filename, chunksize=chunksize):
process(chunk)
You should specify the chunksize
parameter accordingly to your machine's capabilities (that is, make sure it can process the chunk).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With