I would like to use python read and write files of the following format: <pre class="prettyprint"><code>#h -F, field1 field2 field3 a,b,c d,e,f # some comments g,h,i </code></pre> This file closely resembles a typical CSV, except for the following: <ol> <li>The header line starts with #h</li> <li>The second element of the header line is a tag to denote the delimiter</li> <li>The remaining elements of the header are field names (always separated by a single space)</li> <li>Comment lines always start with # and can be scattered throughout the file</li> </ol> Is there any way I can use csv.DictReader() and csv.DictWriter() to read and write these files?

You can parse the first line separately to find the delimiter and fieldnames: <pre class="prettyprint"><code> firstline = next(f).split() delimiter = firstline[1][-1] fields = firstline[2:] </code></pre> Note that <code>csv.DictReader</code> can take any iterable as its first argument. So to skip the comments, you can wrap <code>f</code> in an iterator (<code>skip_comments</code>) which yields only non-comment lines: <pre class="prettyprint"><code>import csv def skip_comments(iterable): for line in iterable: if not line.startswith('#'): yield line with open('data.csv','rb') as f: firstline = next(f).split() delimiter = firstline[1][-1] fields = firstline[2:] for line in csv.DictReader(skip_comments(f), delimiter = delimiter, fieldnames = fields): print line </code></pre> On the data you posted this yields <pre class="prettyprint"><code>{'field2': 'b', 'field3': 'c', 'field1': 'a'} {'field2': 'e', 'field3': 'f', 'field1': 'd'} {'field2': 'h', 'field3': 'i', 'field1': 'g'} </code></pre> <hr> To write a file in this format, you could use a <code>header</code> helper function: <pre class="prettyprint"><code>def header(delimiter,fields): return '#h -F{d} {f}\n'.format(d = delimiter, f=' '.join(fields)) with open('data.csv', 'rb') as f: with open('output.csv', 'wb') as g: firstline = next(f).split() delimiter = firstline[1][-1] fields = firstline[2:] writer = csv.DictWriter(g, delimiter = delimiter, fieldnames = fields) g.write(header(delimiter,fields)) for row in csv.DictReader(skip_comments(f), delimiter = delimiter, fieldnames = fields): writer.writerow(row) g.write('# comment\n') </code></pre> Note that you can write to <code>output.csv</code> using <code>g.write</code> (for header or comment lines) or <code>writer.writerow</code> (for csv).

Reading formatted text using python

Tags:

python

csv

I would like to use python read and write files of the following format:

Click to copy

#h -F, field1 field2 field3
a,b,c
d,e,f
# some comments
g,h,i

This file closely resembles a typical CSV, except for the following:

The header line starts with #h
The second element of the header line is a tag to denote the delimiter
The remaining elements of the header are field names (always separated by a single space)
Comment lines always start with # and can be scattered throughout the file

Is there any way I can use csv.DictReader() and csv.DictWriter() to read and write these files?

436

asked Feb 07 '12 14:02

Dave

1 Answers

You can parse the first line separately to find the delimiter and fieldnames:

Click to copy

    firstline = next(f).split()
    delimiter = firstline[1][-1]
    fields = firstline[2:]

Note that csv.DictReader can take any iterable as its first argument. So to skip the comments, you can wrap f in an iterator (skip_comments) which yields only non-comment lines:

Click to copy

import csv
def skip_comments(iterable):
    for line in iterable:
        if not line.startswith('#'):
            yield line

with open('data.csv','rb') as f:
    firstline = next(f).split()
    delimiter = firstline[1][-1]
    fields = firstline[2:]
    for line in csv.DictReader(skip_comments(f),
                               delimiter = delimiter, fieldnames = fields):
        print line

On the data you posted this yields

Click to copy

{'field2': 'b', 'field3': 'c', 'field1': 'a'}
{'field2': 'e', 'field3': 'f', 'field1': 'd'}
{'field2': 'h', 'field3': 'i', 'field1': 'g'}

To write a file in this format, you could use a header helper function:

Click to copy

def header(delimiter,fields):
    return '#h -F{d} {f}\n'.format(d = delimiter, f=' '.join(fields))

with open('data.csv', 'rb') as f:
    with open('output.csv', 'wb') as g:
        firstline = next(f).split()
        delimiter = firstline[1][-1]
        fields = firstline[2:]
        writer = csv.DictWriter(g, delimiter = delimiter, fieldnames = fields)
        g.write(header(delimiter,fields))
        for row in csv.DictReader(skip_comments(f),
                                   delimiter = delimiter, fieldnames = fields):
            writer.writerow(row)
            g.write('# comment\n')

Note that you can write to output.csv using g.write (for header or comment lines) or writer.writerow (for csv).

answered Oct 31 '22 18:10

unutbu

Related questions
                            
                                Scapy: Adding new protocol with complex field groupings
                            
                                Python. Doing some work on background with Gtk GUI
                            
                                Clip an image using several patches in matplotlib
                            
                                How do you make the linewidth of a single line change as a function of x in matplotlib?
                            
                                How to pass class's self through a flask.Blueprint.route decorator?
                            
                                Python naming conventions for functions that do modify the object or return a modified copy
                            
                                matplotlib plt.show() only selected objects
                            
                                Connect python to oracle
                            
                                Inverted fancy indexing
                            
                                How to use Pageant with Paramiko on Windows?
                            
                                How to avoid the "This message may not have been sent by" warning when sending email using Google App Engine?
                            
                                Python: Catching the output from subprocess.call with stdout
                            
                                Scripting library for monitoring server health?
                            
                                Python - how can I change default path when installing modules?
                            
                                Does the inaccessible `.0` variable in `locals()` affect memory or performance?
                            
                                Encoding used for u"" literals
                            
                                Decorating a method that's already a classmethod?
                            
                                Remove rows from data: overlapping time intervals?
                            
                                web2py insert methods
                            
                                Python nltk: Find collocations without dot-separated words

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reading formatted text using python

Tags:

python

csv

Dave

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us