Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - how to read file with NUL delimited lines?

Tags:

python

nul

I usually use the following Python code to read lines from a file :

f = open('./my.csv', 'r')
for line in f:
    print line

But how about if the file is line delimited by "\0" (not "\n") ? Is there a Python module that could handle this ?

Thanks for any advice.

like image 864
user1129812 Avatar asked Feb 11 '12 02:02

user1129812


1 Answers

If your file is small enough that you can read it all into memory you can use split:

for line in f.read().split('\0'):
    print line

Otherwise you might want to try this recipe from the discussion about this feature request:

def fileLineIter(inputFile,
                 inputNewline="\n",
                 outputNewline=None,
                 readSize=8192):
   """Like the normal file iter but you can set what string indicates newline.
   
   The newline string can be arbitrarily long; it need not be restricted to a
   single character. You can also set the read size and control whether or not
   the newline string is left on the end of the iterated lines.  Setting
   newline to '\0' is particularly good for use with an input file created with
   something like "os.popen('find -print0')".
   """
   if outputNewline is None: outputNewline = inputNewline
   partialLine = ''
   while True:
       charsJustRead = inputFile.read(readSize)
       if not charsJustRead: break
       partialLine += charsJustRead
       lines = partialLine.split(inputNewline)
       partialLine = lines.pop()
       for line in lines: yield line + outputNewline
   if partialLine: yield partialLine

I also noticed your file has a "csv" extension. There is a CSV module built into Python (import csv). There is an attribute called Dialect.lineterminator however it is currently not implemented in the reader:

Dialect.lineterminator

The string used to terminate lines produced by the writer. It defaults to '\r\n'.

Note The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator. This behavior may change in the future.

like image 175
Mark Byers Avatar answered Sep 21 '22 17:09

Mark Byers