Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add file name as last column of CSV file

Tags:

python

linux

csv

I have a Python script which modifies a CSV file to add the filename as the last column:

import sys
import glob

for filename in glob.glob(sys.argv[1]):
    file = open(filename)
    data = [line.rstrip() + "," + filename for line in file]
    file.close()

    file = open(filename, "w")
    file.write("\n".join(data))
    file.close()

Unfortunately, it also adds the filename to the header (first) row of the file. I would like the string "ID" added to the header instead. Can anybody suggest how I could do this?

like image 295
Henry Levine Avatar asked Dec 07 '22 22:12

Henry Levine


2 Answers

Have a look at the official csv module.

like image 86
Reto Aebersold Avatar answered Dec 17 '22 00:12

Reto Aebersold


Here are a few minor notes on your current code:

  • It's a bad idea to use file as a variable name, since that shadows the built-in type.
  • You can close the file objects automatically by using the with syntax.
  • Don't you want to add an extra column in the header line, called something like Filename, rather than just omitting a column in the first row?
  • If your filenames have commas (or, less probably, newlines) in them, you'll need to make sure that the filename is quoted - just appending it won't do.

That last consideration would incline me to use the csv module instead, which will deal with the quoting and unquoting for you. For example, you could try something like the following code:

import glob
import csv
import sys

for filename in glob.glob(sys.argv[1]):
    data = []
    with open(filename) as finput:
        for i, row in enumerate(csv.reader(finput)):
            to_append = "Filename" if i == 0 else filename
            data.append(row+[to_append])
    with open(filename,'wb') as foutput:
        writer = csv.writer(foutput)
        for row in data:
            writer.writerow(row)

That may quote the data slightly differently from your input file, so you might want to play with the quoting options for csv.reader and csv.writer described in the documentation for the csv module.

As a further point, you might have good reasons for taking a glob as a parameter rather than just the files on the command line, but it's a bit surprising - you'll have to call your script as ./whatever.py '*.csv' rather than just ./whatever.py *.csv. Instead, you could just do:

for filename in sys.argv[1:]:

... and let the shell expand your glob before the script knows anything about it.

One last thing - the current approach you're taking is slightly dangerous, in that if anything fails when writing back to the same filename, you'll lose data. The standard way of avoiding this is to instead write to a temporary file, and, if that was successful, rename the temporary file over the original. So, you might rewrite the whole thing as:

import csv
import sys
import tempfile
import shutil

for filename in sys.argv[1:]:
    tmp = tempfile.NamedTemporaryFile(delete=False)
    with open(filename) as finput:
        with open(tmp.name,'wb') as ftmp:
            writer = csv.writer(ftmp)
            for i, row in enumerate(csv.reader(finput)):
                to_append = "Filename" if i == 0 else filename
                writer.writerow(row+[to_append])
    shutil.move(tmp.name,filename)
like image 39
Mark Longair Avatar answered Dec 17 '22 02:12

Mark Longair