How to know the byte position of a row of a CSV file in python?

Question

I'm operating with huge CSV files (20-25Mln rows) and don't want to split them into smaller pieces for a lot of reasons.

My script reads a file row by row using csv module. I need to now a position (byte number) of the line which will be read on the next iteration (or which just was read).

I tried

>>> import csv
>>> f = open("uscompany.csv","rU")
>>> reader = csv.reader(f)
>>> reader.next()
....
>>> f.tell()
8230

But it seems csv module reads the file by blocks. Since when I keep on iteration I get the same position

>>> reader.next()
....
>>> f.tell()
8230

Any suggestions? Please advice.

John Y · Accepted Answer

If by "byte position" you mean the byte position as if you had read the file in as a normal text file, then my suggestion is to do just that. Read in the file line by line as text, and get the position within the line that way. You can still parse the CSV data row by row yourself using the csv module:

for line in myfile:
  row = csv.reader([line]).next()

I think it is perfectly good design for the CSV reader to not provide a byte position of this kind, because it really doesn't make much sense in a CSV context. After all, "data" and data are the exact same four bytes of data as far as CSV is concerned, but the d might be the 2nd byte or the 1st byte depending on whether the optional surrounding quotes were used.

Andreas Jung · Answer

Short answer: not possible. The byte position is not available through the csvreader API

How to know the byte position of a row of a CSV file in python?

Tags:

python

file

csv

Maksym Polshcha

2 Answers

John Y

Andreas Jung

Recent Activity

Donate For Us

How to know the byte position of a row of a CSV file in python?

Tags:

python

file

csv

Maksym Polshcha

2 Answers

John Y

Andreas Jung

Related questions

Recent Activity

Donate For Us