The text file look like:
david weight_2005 50
david weight_2012 60
david height_2005 150
david height_2012 160
mark weight_2005 90
mark weight_2012 85
mark height_2005 160
mark height_2012 170
How to calculate mean of weight and height for david and mark as follows:
david>> mean(weight_2005 and weight_2012), mean (height_2005 and height_2012)
mark>> mean(weight_2005 and weight_2012), mean (height_2005 and height_2012)
my incomplete code is:
import numpy as np
import csv
with open ('data.txt','r') as infile:
contents = csv.reader(infile, delimiter=' ')
c1,c2,c3 = zip(*contents)
data = np.array(c3,dtype=float)
Then how to apply np.mean??
The mean function is for computing the average of an array of numbers. You will need to come up with a way to select the values of c3 by applying a condition to c2.
What would probably suit your needs better would be splitting up the data into a hierarchical structure, I prefer using dictionaries. Something like
data = {}
with open('data.txt') as f:
contents = csv.reader(f, delimiter=' ')
for (name, attribute, value) in contents:
data[name] = data.get(name, {}) # Default value is a new dict
attr_name, attr_year = attribute.split('_')
attr_year = int(attr_year)
data[name][attr_name] = data[name].get(attr_name, {})
data[name][attr_name][attr_year] = value
Now data will look like
{
"david": {
"weight": {
2005: 50,
2012: 60
},
"height": {
2005: 150,
2012: 160
}
},
"mark": {
"weight": {
2005, 90,
2012, 85
},
"height": {
2005: 160,
2012: 170
}
}
}
Then what you can do is
david_avg_weight = np.mean(data['david']['weight'].values())
mark_avg_height = np.mean([v for k, v in data['mark']['height'].iteritems() if 2008 < k])
Here I'm still using np.mean, but only calling it on a normal Python list.
I'll make this community wiki, because it's more "here's how I think you should do it instead" than "here's the answer to the question you asked". For something like this I'd probably use pandas instead of numpy, as its grouping tools are much better. It'll also be useful to compare with numpy-based approaches.
import pandas as pd
df = pd.read_csv("data.txt", sep="[ _]", header=None,
names=["name", "property", "year", "value"])
means = df.groupby(["name", "property"])["value"].mean()
.. and, er, that's it.
First, read in the data into a DataFrame, letting either whitespace or _ separate columns:
>>> import pandas as pd
>>> df = pd.read_csv("data.txt", sep="[ _]", header=None,
names=["name", "property", "year", "value"])
>>> df
name property year value
0 david weight 2005 50
1 david weight 2012 60
2 david height 2005 150
3 david height 2012 160
4 mark weight 2005 90
5 mark weight 2012 85
6 mark height 2005 160
7 mark height 2012 170
Then group by name and property, take the value column, and compute the mean:
>>> means = df.groupby(["name", "property"])["value"].mean()
>>> means
name property
david height 155.0
weight 55.0
mark height 165.0
weight 87.5
Name: value, dtype: float64
.. okay, the sep="[ _]" trick is a little too cute for real code, though it works well enough here. In practice I'd use a whitespace separator, read in the second column as property_year and then do
df["property"], df["year"] = zip(*df["property_year"].str.split("_"))
del df["property_year"]
to allow underscores in other columns.
You can read your data directly in a numpy array with:
data = np.recfromcsv("data.txt", delimiter=" ", names=['name', 'type', 'value'])
then you can find appropriate indices with np.where :
indices = np.where((data.name == 'david') * data.type.startswith('height'))
and perform the mean on thoses indices :
np.mean(data.value[indices])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With