In the following script, is there a way to find out how many "chunks" there are in total?
import pandas as pd
import numpy as np
data = pd.read_csv('data.txt', delimiter = ',', chunksize = 50000)
for chunk in data:
print(chunk)
Using len(chunk)
will only give me how many each one has.
Is there a way to do it without adding the iteration manually?
CSV, being row-based, does not allow a process to know how many lines there are in it until after it has all been scanned.
Very minimal scanning is necessary, though, assuming the CSV file is well formed:
sum(1 for row in open('data.txt', 'r'))
This might prove useful in case you need to calculate in advance how many chunks there are. A full CSV reader is an overkill for this. The above line has very low memory requirements, and does minimal parsing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With