I'm iterating over a large csv file and I'd like to print out some progress indicator. As I understand counting the number of lines would requires parsing all of the file for newline characters. So I cannot easily estimate progress with line number.
Is there anything else I can do to estimate the progress while reading in lines? Maybe I can go by size?
You can use tqdm with large files in the following way:
import os
import tqdm
with tqdm.tqdm(total=os.path.getsize(filename)) as pbar:
with open(filename, "rb") as f:
for l in f:
pbar.update(len(l))
...
If you read a utf-8
file then your len(l)
won't give you the exact number of bytes but it should be good enough.
This is based on the @Piotr's answer for Python3
import os
import tqdm
with tqdm(total=os.path.getsize(filepath)) as pbar:
with open(filepath) as file:
for line in file:
pbar.update(len(line.encode('utf-8')))
....
file.close()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With