Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skip the last row of CSV file when iterating in Python

Tags:

python

csv

I am working on a data analysis using a CSV file that I got from a datawarehouse(Cognos). The CSV file has the last row that sums up all the rows above, but I do not need this line for my analysis, so I would like to skip the last row.

I was thinking about adding "if" statement that checks a column name within my "for" loop like below.

import CSV

with open('COGNOS.csv', "rb") as f, open('New_COGNOS.csv', "wb") as w:
    #Open 2 CSV files. One to read and the other to save.
    CSV_raw = csv.reader(f)
    CSV_new = csv.writer(w)
    for row in CSV_raw:
        item_num = row[3].split(" ")[0]
        row.append(item_num)
        if row[0] == "All Materials (By Collection)": break
        CSV_new.writerow(row)

However, this looks like wasting a lot of resource. Is there any pythonian way to skip the last row when iterating through CSV file?

like image 897
Yong Jun Kim Avatar asked May 30 '13 21:05

Yong Jun Kim


2 Answers

You can write a generator that'll return everything but the last entry in an input iterator:

def skip_last(iterator):
    prev = next(iterator)
    for item in iterator:
        yield prev
        prev = item

then wrap your CSV_raw reader object in that:

for row in skip_last(CSV_raw):

The generator basically takes the first entry, then starts looping and on each iteration yield the previous entry. When the input iterator is done, there is still one line left, that is never returned.

A generic version, letting you skip the last n elements, would be:

from collections import deque
from itertools import islice

def skip_last_n(iterator, n=1):
    it = iter(iterator)
    prev = deque(islice(it, n), n)
    for item in it:
        yield prev.popleft()
        prev.append(item)
like image 60
Martijn Pieters Avatar answered Oct 20 '22 03:10

Martijn Pieters


A generalized "skip-n" generator

from __future__ import print_function
from StringIO import StringIO
from itertools import tee
s = '''\
1
2
3
4
5
6
7
8
'''
def skip_last_n(iterator, n=1):
    a, b = tee(iterator)
    for x in xrange(n):
            next(a)
    for line in a:
            yield next(b)

i = StringIO(s)
for x in skip_last_n(i, 1):
    print(x, end='')
1
2
3
4
5
6
7

i = StringIO(s)
for x in skip_last_n(i, 3):
    print(x, end='')
1
2
3
4
5
like image 1
iruvar Avatar answered Oct 20 '22 01:10

iruvar