Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge CSV Files in Python with Different file names [closed]

Tags:

python

csv

I'm really new to Python, so this question might be a bit basic. I have 44 csv files with the same headers and different file names. I want to combine them all into one file.

Each file is named "Votes-[member-name]-(2010-2014)-[download-time].csv"

The headers are do not include a column for the member name. I would like to add that as the first item.

This does part of what I want to do: how to merge 200 csv files in Python. I'm just not sure how to iterate through files with different names, and add those names to the csv.

Thanks!

like image 554
cassidycodes Avatar asked Dec 19 '13 14:12

cassidycodes


1 Answers

To iterate through the filenames you can use a similar method as answered here, using glob:

import glob
import os
os.chdir("/mydir")
for files in glob.glob("*.csv"):
    print files

Then, to add the member name to the header, you can print all the csv files line by line. If the line is a header, then print the member name on the same line as the header. (This isn't real code, but you can get the point)

for files in glob.glob("*.csv"):
    for lines in files:
         if line == header:
              print member,line
         else:
              print line

To split the CSV file and only use the member name (slightly modified so to not have a hyphen)

'Votes-[member name]-(2010-2014)-[download-time].csv'.split('-')[1]

UPDATE for bash solution: You can save this text and run it from the terminal (see instructions here for Mac)

Generate CSV files (not necessary)

cat <<"EOF" > 1.csv
1,2,3
4,5,6
EOF

cat <<"EOF" > 2.csv
a,b,c
d,e,f
EOF

Parse CSV files - this script takes all CSV files and writes their file name as the first column. It also puts them into one file (note that I tested on debian linux, not mac).

rm -f all.csv
for fyle in *.csv ; do 
    echo | awk -v f=$fyle '{ print f","$0 }' $fyle >> all.csv
done
exit 0

SECOND UPDATE: If you want to remove the duplicate headers, the simplest way from the shell is to use 'grep -v', which selects all lines that don't match. You can pick a generic string that only exists in the header.

head -1 all.csv > nohead.csv # add one header line
grep -v "header string" all.csv >> nohead.csv # add rest of rows, with no headers
like image 156
philshem Avatar answered Sep 30 '22 19:09

philshem