Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I create a progress bar when a DataFrame is initializing?

I want to get the number of rows each time a new one is created when I load a .csv file into a dataframe :

def file_len(fname):
    with open(fname) as f:
        for i, l in enumerate(f):
            pass
    return i + 1

csv_path = "C:/...."
max_length = file_len(csv_path)

data = read_csv(csv_path, sep=';', encoding='utf-8')

With that code I get the max number of rows but I don't know how to get the number of rows in the dataframe, each time one is created. I wanted to use them to make a 0-100% progress bar

like image 831
Jean Avatar asked Jul 14 '14 13:07

Jean


1 Answers

You can't do this - you would have to modify read_csv function and maybe other functions in pandas.


EDIT:

It seems it can bo done now with chunksize=rows_number.

Using only iterator=True didn't work for me - or maybe it needed more rows.

Thanks to Jeff

Try this

import pandas as pd

from StringIO import StringIO

data = """A,B,C
foo,1,2,3
bar,4,5,6
baz,7,8,9
"""

reader = pd.read_csv(StringIO(data), chunksize=1)

for x in reader:
    print x
    print '--- next data ---'

result:

     A  B  C
foo  1  2  3
--- next data ---
     A  B  C
bar  4  5  6
--- next data ---
     A  B  C
baz  7  8  9
--- next data ---
like image 123
furas Avatar answered Oct 07 '22 02:10

furas