Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reorganize a CSV so Dates are not column Headers

Tags:

python

csv

excel

I am trying to reorganize an excel table (or csv) so that dates are no longer column headers. I'm using a limited knowledge of python to attempt to do this but for lack of knowing where to start I can use some assistance.

Under each date is a record of what happened that day for a particular place. Null values can be skipped. Some cell contain a "--" and can be converted to a 0. I would like to make a column to for the date and a column to denote the numeric reading for the day. The place name is a new row if it was monitored that day.

Example (smh at the person that started it this way):

Name,7/1/2009,7/2/2009,7/3/2009,7/4/2009..... (and so on to the present)
Place A,,5,3,
Place B,0,,23,--
Place C,1,2,,35

What I would like is:

Name, Date, Reading
Place A, 7/2/2009, 5
Place A, 7/3/2009, 3
Place B, 7/1/2009, 0
Place B, 7/4/2009, 0   <--- Even though this is a dash originally it can be converted to a 0 to keep the number an int.  

There are hundreds of rows (places) and the columns (dates) have gotten to BPD (that's right 1772 columns!).

like image 757
John S Avatar asked Oct 31 '22 18:10

John S


1 Answers

What you're trying to do is to normalize as table.

The way you do this in general is: For each row in the denormal table, you insert rows into the normal table for each denormal column.

The way you do this in particular depends on how you're processing the tables. For example, if you're using the csv module, in Python 3.x, with an Excel-default-dialect CSV file, it'll go something like this:

with open('old.csv') as oldcsv, open('new.csv', 'w') as newcsv:
    r, w = csv.reader(oldcsv), csv.writer(newcsv)
    header = next(r)
    w.writerow(['Name', 'Date', 'Reading'])
    for row in r:
        for colname, colval in zip(header[1:], row[1:]):
            w.writerow([row[0], colname, colval])

If you want to use, e.g., xlrd/xlwt, XlsxReader/XlsxWriter, win32com scripting of Excel, etc., the details will be different, but the basic idea will be the same: iterate over the rows, then iterate over the date columns, generating a new row for each one based on the name from the row, the date from the column header, and the value from the row.

And you should be able to figure out how to skip null values, convert "--" to 0, etc. on from here.

like image 125
abarnert Avatar answered Nov 15 '22 04:11

abarnert