Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Total number of chunks in pandas

Tags:

python

pandas

In the following script, is there a way to find out how many "chunks" there are in total?

import pandas as pd
import numpy as np

data = pd.read_csv('data.txt', delimiter = ',', chunksize = 50000) 

for chunk in data:
    print(chunk)

Using len(chunk) will only give me how many each one has.

Is there a way to do it without adding the iteration manually?

like image 608
Leb Avatar asked Jul 11 '15 23:07

Leb


1 Answers

CSV, being row-based, does not allow a process to know how many lines there are in it until after it has all been scanned.

Very minimal scanning is necessary, though, assuming the CSV file is well formed:

sum(1 for row in open('data.txt', 'r'))

This might prove useful in case you need to calculate in advance how many chunks there are. A full CSV reader is an overkill for this. The above line has very low memory requirements, and does minimal parsing.

like image 111
Ami Tavory Avatar answered Oct 08 '22 22:10

Ami Tavory