Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterate over large file with progress indicator in Python?

Tags:

python

I'm iterating over a large csv file and I'd like to print out some progress indicator. As I understand counting the number of lines would requires parsing all of the file for newline characters. So I cannot easily estimate progress with line number.

Is there anything else I can do to estimate the progress while reading in lines? Maybe I can go by size?

like image 359
Gerenuk Avatar asked Jul 22 '14 14:07

Gerenuk


2 Answers

You can use tqdm with large files in the following way:

import os
import tqdm

with tqdm.tqdm(total=os.path.getsize(filename)) as pbar:
   with open(filename, "rb") as f:
      for l in f:
          pbar.update(len(l))
          ...

If you read a utf-8 file then your len(l) won't give you the exact number of bytes but it should be good enough.

like image 184
Piotr Czapla Avatar answered Oct 12 '22 00:10

Piotr Czapla


This is based on the @Piotr's answer for Python3

import os
import tqdm

with tqdm(total=os.path.getsize(filepath)) as pbar:
    with open(filepath) as file:
        for line in file:
            pbar.update(len(line.encode('utf-8')))
            ....
        file.close()
like image 22
YohanK Avatar answered Oct 12 '22 02:10

YohanK