Averaging the values in a dictionary based on the key

Question

I am new to Python and I have a set of values like the following:

(3, '655')
(3, '645')
(3, '641')
(4, '602')
(4, '674')
(4, '620')

This is generated from a CSV file with the following code (python 2.6):

import csv
import time

with open('file.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        date = time.strptime(row[3], "%a %b %d %H:%M:%S %Z %Y")
        data = date, row[5]

        month = data[0][1]
        avg = data[1]
        monthAvg = month, avg
        print monthAvg

What I would like to do is get an average of the values based on the keys:

(3, 647)
(4, 632)

My initial thought was to create a new dictionary.

loop through the original dictionary
    if the key does not exist
        add the key and value to the new dictionary
    else
        sum the value to the existing value in the new dictionary

I'd also have to keep a count of the number of keys so I could produce the average. Seems like a lot of work though - I wasn't sure if there was a more elegant way to accomplish this.

Thank you.

Mazdak · Accepted Answer

You can use collections.defaultdict to create a dictionary with unique keys and lists of values:

>>> l=[(3, '655'),(3, '645'),(3, '641'),(4, '602'),(4, '674'),(4, '620')]
>>> from collections import defaultdict
>>> d=defaultdict(list)
>>> 
>>> for i,j in l:
...    d[i].append(int(j))
... 
>>> d
defaultdict(<type 'list'>, {3: [655, 645, 641], 4: [602, 674, 620]})

Then use a list comprehension to create the expected pairs:

>>> [(i,sum(j)/len(j)) for i,j in d.items()]
[(3, 647), (4, 632)]

And within your code you can do:

with open('file.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        date = time.strptime(row[3], "%a %b %d %H:%M:%S %Z %Y")
        data = date, row[5]

        month = data[0][1]
        avg = data[1]
        d[month].append(int(avg))

     print [(i,sum(j)/len(j)) for i,j in d.items()]

TheBlackCat · Answer

Use pandas, it is designed specifically to do these sorts of things, meaning you can express them in only a small amount of code (what you want to do is a one-liner). Further, it will be much, much faster than any of the other approaches when given a lot of values.

import pandas as pd

a=[(3, '655'),
   (3, '645'),
   (3, '641'),
   (4, '602'),
   (4, '674'),
   (4, '620')]

res = pd.DataFrame(a).astype('float').groupby(0).mean()
print(res)

Gives:

Here is a multi-line version, showing what happens:

df = pd.DataFrame(a)  # construct a structure containing data
df = df.astype('float')  # convert data to float values
grp = df.groupby(0)  # group the values by the value in the first column
df = grp.mean()  # take the mean of each group

Further, if you want to use a csv file, it is even easier since you don't need to parse the csv file yourself (I use made-up names for the columns I don't know):

import pandas as pd
df = pd.read_csv('file.csv', columns=['col0', 'col1', 'col2', 'date', 'col4', 'data'], index=False, header=None)
df['month'] = pd.DatetimeIndex(df['date']).month
df = df.loc[:,('month', 'data')].groupby('month').mean()

Averaging the values in a dictionary based on the key

Tags:

python

dictionary

JamesE

2 Answers

Mazdak

TheBlackCat

Recent Activity

Donate For Us

Averaging the values in a dictionary based on the key

Tags:

python

dictionary

JamesE

2 Answers

Mazdak

TheBlackCat

Related questions

Recent Activity

Donate For Us