I'm iterating over a large csv file and I'd like to print out some progress indicator. As I understand counting the number of lines would requires parsing all of the file for newline characters. So I cannot easily estimate progress with line number.
Is there anything else I can do to estimate the progress while reading in lines? Maybe I can go by size?
You can use tqdm with large files in the following way:
import os
import tqdm
with tqdm.tqdm(total=os.path.getsize(filename)) as pbar:
   with open(filename, "rb") as f:
      for l in f:
          pbar.update(len(l))
          ...
If you read a utf-8 file then your len(l) won't give you the exact number of bytes but it should be good enough.
This is based on the @Piotr's answer for Python3
import os
import tqdm
with tqdm(total=os.path.getsize(filepath)) as pbar:
    with open(filepath) as file:
        for line in file:
            pbar.update(len(line.encode('utf-8')))
            ....
        file.close()
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With