Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skipping lines, csv.DictReader

I have a file that has an obnoxious preface to the header. So it looks like this:

Review performed by:    

Meeting:    

Person:     

Number:     

Code: 



Confirmation    

Tab Separated Header Names That I Want To Use

I want to skip past everything and use the tab sep header names for my code. This is what I have so far:

reader = csv.DictReader(CSVFile)
for i in range(14): #trying to skip the first 14 rows
    reader.next()
for row in reader:
    print(row)
    if args.nextCode:
        tab = (row["Tab"])
        sep = int((row["Separated"]))

This code gets this error:

File "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 104, in next
    row = self.reader.next()
StopIteration

I tried to print the rows, to see where I was in the file, and I changed the "range(14)" to range 5, but when I print the row, I get this:

{'Review performed by:': 'Tab/tSeparated/tHeader/tNames/tThat/tI/tWant/tTo/tUse'}
Traceback (most recent call last):
  File "program.py", line 396, in <module>
    main()
  File "program.py", line 234, in main
    tab = (row["Tab"])
KeyError: 'Tab'

So I am not really sure the right way to skip those top lines. Any help would be appreciated.

like image 970
Stephopolis Avatar asked Jun 24 '15 15:06

Stephopolis


People also ask

How do I skip a line in CSV?

In Python, while reading a CSV using the CSV module you can skip the first line using next() method.

How do I skip the first line in a CSV file?

Line 1: We import the Pandas library as a pd. Line 2: We read the csv file using the pandas read_csv module, and in that, we mentioned the skiprows=[0], which means skip the first line while reading the csv file data.

What is the use of DictReader () function?

DictReader class operates like a regular reader but maps the information read into a dictionary. The keys for the dictionary can be passed in with the fieldnames parameter or inferred from the first row of the CSV file.


2 Answers

A csv.DictReader reads the first line from the file when it's instantiated, to get the headers for subsequent rows. Therefore it uses Review performed by: as the header row, then you skip the next 14 rows.

Instead, skip the lines before creating the DictReader:

for i in range(14):
    CSVFile.next()
reader = csv.DictReader(CSVFile)
...
like image 179
jonrsharpe Avatar answered Sep 19 '22 14:09

jonrsharpe


You could wrap the CSVFile with an itertools.islice iterator object to slice-off the lines of the preface when creating the DictReader, instead of the providing it directly to the constructor.

This works because the csv.reader constructor will accept "any object which supports the iterator protocol and returns a string each time its __next__() method is called" as its first argument according to the csv docs. This also applies to csv.DictReaders because they're implemented via an underlying csv.reader instance.

Note how the next(iterator).split() expression supplies the csv.DictReader with a fieldnames argument (so it's not taken it from the first line of the file when it's instantiated).

iterator = itertools.islice(CSVFile, 14, None)  # Skip header lines.
for row in csv.DictReader(CSVFile, next(iterator).split(), delimiter='\t'):
    # process row ...
like image 36
martineau Avatar answered Sep 20 '22 14:09

martineau