Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading formatted text using python

Tags:

python

csv

I would like to use python read and write files of the following format:

#h -F, field1 field2 field3
a,b,c
d,e,f
# some comments
g,h,i

This file closely resembles a typical CSV, except for the following:

  1. The header line starts with #h
  2. The second element of the header line is a tag to denote the delimiter
  3. The remaining elements of the header are field names (always separated by a single space)
  4. Comment lines always start with # and can be scattered throughout the file

Is there any way I can use csv.DictReader() and csv.DictWriter() to read and write these files?

like image 436
Dave Avatar asked Feb 07 '12 14:02

Dave


People also ask

How do you read a specific line in a text file in Python?

Use readlines() to Read the range of line from the File The readlines() method reads all lines from a file and stores it in a list. You can use an index number as a line number to extract a set of lines from it. This is the most straightforward way to read a specific line from a file in Python.

What is .read in Python?

Python File read() Method The read() method returns the specified number of bytes from the file. Default is -1 which means the whole file.

How do you read a file in Python and store it in a list?

You can read a text file using the open() and readlines() methods. To read a text file into a list, use the split() method.

How do you structure a text file in Python?

Steps for writing to text files To write to a text file in Python, you follow these steps: First, open the text file for writing (or append) using the open() function. Second, write to the text file using the write() or writelines() method. Third, close the file using the close() method.


1 Answers

You can parse the first line separately to find the delimiter and fieldnames:

    firstline = next(f).split()
    delimiter = firstline[1][-1]
    fields = firstline[2:]

Note that csv.DictReader can take any iterable as its first argument. So to skip the comments, you can wrap f in an iterator (skip_comments) which yields only non-comment lines:

import csv
def skip_comments(iterable):
    for line in iterable:
        if not line.startswith('#'):
            yield line

with open('data.csv','rb') as f:
    firstline = next(f).split()
    delimiter = firstline[1][-1]
    fields = firstline[2:]
    for line in csv.DictReader(skip_comments(f),
                               delimiter = delimiter, fieldnames = fields):
        print line

On the data you posted this yields

{'field2': 'b', 'field3': 'c', 'field1': 'a'}
{'field2': 'e', 'field3': 'f', 'field1': 'd'}
{'field2': 'h', 'field3': 'i', 'field1': 'g'}

To write a file in this format, you could use a header helper function:

def header(delimiter,fields):
    return '#h -F{d} {f}\n'.format(d = delimiter, f=' '.join(fields))

with open('data.csv', 'rb') as f:
    with open('output.csv', 'wb') as g:
        firstline = next(f).split()
        delimiter = firstline[1][-1]
        fields = firstline[2:]
        writer = csv.DictWriter(g, delimiter = delimiter, fieldnames = fields)
        g.write(header(delimiter,fields))
        for row in csv.DictReader(skip_comments(f),
                                   delimiter = delimiter, fieldnames = fields):
            writer.writerow(row)
            g.write('# comment\n')

Note that you can write to output.csv using g.write (for header or comment lines) or writer.writerow (for csv).

like image 55
unutbu Avatar answered Oct 31 '22 18:10

unutbu