Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the byte position of specific line in a file

What's the fastest way to find the byte position of a specific line in a file, from the command line?

e.g.

$ linepos myfile.txt 13
5283

I'm writing a parser for a CSV that's several GB in size, and in the event the parser is halted, I'd like to be able to resume from the last position. The parser is in Python, but even iterating over file.readlines() takes a long time, since there are millions of rows in the file. I'd like to simply do file.seek(int(command.getoutput("linepos myfile.txt %i" % lastrow))), but I can't find a shell command to efficiently do this.

Edit: Sorry for the confusion, but I'm looking for a non-Python solution. I already know how to do this from Python.

like image 713
Cerin Avatar asked Feb 04 '14 17:02

Cerin


People also ask

How is byte position calculated?

The byte offset is just the count of the bytes, starting at 0. The big question is: how are the 16-bit offsets for the branch instructions calculated. The big answer is: count the number of bytes to the destination. The first branch is in instruction 7 in the IJVM code, and at offset 11 in the hex byte code.

What is byte offset in file?

byte offset is the number of character that exists counting from the beginning of a line. for example, this line. what is byte offset? will have a byte offset of 19. This is used as key value in hadoop.

What is position in CSV?

A position is used to report errors in CSV data. All positions include the byte offset, line number and record index at which the error occurred. Byte offsets and record indices start at 0 . Line numbers start at 1 . A CSV reader will automatically assign the position of each record.


1 Answers

From @chepner's comment on my other answer:

position = 0  # or wherever you left off last time
try:
    with open('myfile.txt') as file:
        file.seek(position)  # zero in base case
        for line in file:
            position = file.tell() # current seek position in file
            # process the line
except:
    print 'exception occurred at position {}'.format(position)
    raise
like image 63
mhlester Avatar answered Oct 13 '22 20:10

mhlester