Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python CSV - Need to Group and Calculate values based on one key

Tags:

python

csv

I have a simple 3 column csv file that i need to use python to group each row based on one key, then average the values for another key and return them. File is standard csv format, set up as so;

ID, ZIPCODE, RATE
1, 19003, 27.50
2, 19003, 31.33
3, 19083, 41.4
4, 19083, 17.9
5, 19102, 21.40

So basically what I need to do is calculate the average rate col[2] for each unique zipcode col[1] in that file and return the results. So get average rate for all records in 19003, 19083, and so on.

I've looked at using csv module and reading the file into a dictionary, then sorting the dict based on unique values in the zipcode col but can't seem to make any progress.

Any help/suggestions appreciated.

like image 859
ply Avatar asked Dec 27 '22 23:12

ply


1 Answers

I've documented some steps to help clarify things:

import csv
from collections import defaultdict

# a dictionary whose value defaults to a list.
data = defaultdict(list)
# open the csv file and iterate over its rows. the enumerate()
# function gives us an incrementing row number
for i, row in enumerate(csv.reader(open('data.csv', 'rb'))):
    # skip the header line and any empty rows
    # we take advantage of the first row being indexed at 0
    # i=0 which evaluates as false, as does an empty row
    if not i or not row:
        continue
    # unpack the columns into local variables
    _, zipcode, level = row
    # for each zipcode, add the level the list
    data[zipcode].append(float(level))

# loop over each zipcode and its list of levels and calculate the average
for zipcode, levels in data.iteritems():
    print zipcode, sum(levels) / float(len(levels))

Output:

19102 21.4
19003 29.415
19083 29.65
like image 97
samplebias Avatar answered May 20 '23 14:05

samplebias