Total number of chunks in pandas

Question

In the following script, is there a way to find out how many "chunks" there are in total?

import pandas as pd
import numpy as np

data = pd.read_csv('data.txt', delimiter = ',', chunksize = 50000) 

for chunk in data:
    print(chunk)

Using len(chunk) will only give me how many each one has.

Is there a way to do it without adding the iteration manually?

Ami Tavory · Accepted Answer

CSV, being row-based, does not allow a process to know how many lines there are in it until after it has all been scanned.

Very minimal scanning is necessary, though, assuming the CSV file is well formed:

sum(1 for row in open('data.txt', 'r'))

This might prove useful in case you need to calculate in advance how many chunks there are. A full CSV reader is an overkill for this. The above line has very low memory requirements, and does minimal parsing.

Total number of chunks in pandas

Tags:

python

pandas

Leb

1 Answers

Ami Tavory

Recent Activity

Donate For Us

Total number of chunks in pandas

Tags:

python

pandas

Leb

1 Answers

Ami Tavory

Related questions

Recent Activity

Donate For Us