I have a file that has an obnoxious preface to the header. So it looks like this: <pre class="prettyprint"><code>Review performed by: Meeting: Person: Number: Code: Confirmation Tab Separated Header Names That I Want To Use </code></pre> I want to skip past everything and use the tab sep header names for my code. This is what I have so far: <pre class="prettyprint"><code>reader = csv.DictReader(CSVFile) for i in range(14): #trying to skip the first 14 rows reader.next() for row in reader: print(row) if args.nextCode: tab = (row["Tab"]) sep = int((row["Separated"])) </code></pre> This code gets this error: <pre class="prettyprint"><code>File "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 104, in next row = self.reader.next() StopIteration </code></pre> I tried to print the rows, to see where I was in the file, and I changed the "range(14)" to range 5, but when I print the row, I get this: <pre class="prettyprint"><code>{'Review performed by:': 'Tab/tSeparated/tHeader/tNames/tThat/tI/tWant/tTo/tUse'} Traceback (most recent call last): File "program.py", line 396, in <module> main() File "program.py", line 234, in main tab = (row["Tab"]) KeyError: 'Tab' </code></pre> So I am not really sure the right way to skip those top lines. Any help would be appreciated.

A <code>csv.DictReader</code> reads the first line from the file when it's instantiated, to get the headers for subsequent rows. Therefore it uses <code>Review performed by: </code> as the header row, then you skip the next 14 rows. Instead, skip the lines before creating the <code>DictReader</code>: <pre class="prettyprint"><code>for i in range(14): CSVFile.next() reader = csv.DictReader(CSVFile) ... </code></pre>

You could wrap the <code>CSVFile</code> with an <code>itertools.islice</code> iterator object to slice-off the lines of the preface when creating the <code>DictReader</code>, instead of the providing it directly to the constructor. This works because the <code>csv.reader</code> constructor will accept "any object which supports the iterator protocol and returns a string each time its <code>__next__()</code> method is called" as its first argument according to the csv docs. This also applies to <code>csv.DictReader</code>s because they're implemented via an underlying <code>csv.reader</code> instance. Note how the <code>next(iterator).split()</code> expression supplies the <code>csv.DictReader</code> with a <code>fieldnames</code> argument (so it's not taken it from the first line of the file when it's instantiated). <pre class="prettyprint"><code>iterator = itertools.islice(CSVFile, 14, None) # Skip header lines. for row in csv.DictReader(CSVFile, next(iterator).split(), delimiter='\t'): # process row ... </code></pre>

Skipping lines, csv.DictReader

Tags:

python

csv

python-2.7

I have a file that has an obnoxious preface to the header. So it looks like this:

Review performed by:    

Meeting:    

Person:     

Number:     

Code: 



Confirmation    

Tab Separated Header Names That I Want To Use

I want to skip past everything and use the tab sep header names for my code. This is what I have so far:

reader = csv.DictReader(CSVFile)
for i in range(14): #trying to skip the first 14 rows
    reader.next()
for row in reader:
    print(row)
    if args.nextCode:
        tab = (row["Tab"])
        sep = int((row["Separated"]))

This code gets this error:

File "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 104, in next
    row = self.reader.next()
StopIteration

I tried to print the rows, to see where I was in the file, and I changed the "range(14)" to range 5, but when I print the row, I get this:

{'Review performed by:': 'Tab/tSeparated/tHeader/tNames/tThat/tI/tWant/tTo/tUse'}
Traceback (most recent call last):
  File "program.py", line 396, in <module>
    main()
  File "program.py", line 234, in main
    tab = (row["Tab"])
KeyError: 'Tab'

So I am not really sure the right way to skip those top lines. Any help would be appreciated.

970

asked Jun 24 '15 15:06

Stephopolis

2 Answers

A csv.DictReader reads the first line from the file when it's instantiated, to get the headers for subsequent rows. Therefore it uses Review performed by: as the header row, then you skip the next 14 rows.

Instead, skip the lines before creating the DictReader:

for i in range(14):
    CSVFile.next()
reader = csv.DictReader(CSVFile)
...

179

answered Sep 19 '22 14:09

jonrsharpe

You could wrap the CSVFile with an itertools.islice iterator object to slice-off the lines of the preface when creating the DictReader, instead of the providing it directly to the constructor.

This works because the csv.reader constructor will accept "any object which supports the iterator protocol and returns a string each time its __next__() method is called" as its first argument according to the csv docs. This also applies to csv.DictReaders because they're implemented via an underlying csv.reader instance.

Note how the next(iterator).split() expression supplies the csv.DictReader with a fieldnames argument (so it's not taken it from the first line of the file when it's instantiated).

iterator = itertools.islice(CSVFile, 14, None)  # Skip header lines.
for row in csv.DictReader(CSVFile, next(iterator).split(), delimiter='\t'):
    # process row ...

answered Sep 20 '22 14:09

martineau

Related questions
                            
                                What are the default slice indices *really*?
                            
                                Send some keys to inactive window with python
                            
                                Pearson correlation coefficient 2-tailed p-value meaning [closed]
                            
                                How to disable pylint 'Undefined variable' error for a specific variable in a file?
                            
                                Why am I getting an error message in Python 'cannot import name NoneType'?
                            
                                Same module is being imported in different files
                            
                                Can literals in Python be overridden?
                            
                                Repeat a tuple inside a tuple
                            
                                Numpy: convert an array to a triangular matrix
                            
                                Why built-in functions like abs works on numpy array?
                            
                                What is a django.utils.functional.__proxy__ object and what it helps with?
                            
                                Reshaping an array to 2-D by specifying only the column size
                            
                                Python - Flask: render_template() not found [duplicate]
                            
                                selecting second child in beautiful soup with soup.select?
                            
                                When should I use function currying in Python?
                            
                                What is with this change of unpacking behavior from Python2 to Python3
                            
                                Flask - access the request in after_request or teardown_request
                            
                                PhantomJS returning empty web page (python, Selenium)
                            
                                animated subplots using matplotlib
                            
                                Nim equivalent of Python's list comprehension

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With