Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas calculate mean of column that has lists instead of single value

I have a pandas dataframe that has one column and it has a list of values in each row. I need to calculate the mean using the corresponding values from each row. That is I need the mean for eight values in the list. each element in the list is the value of a variable

>>> df_ex
0    [1, 2, 3, 4, 5, 6, 7, 8]
1    [2, 3, 4, 5, 6, 7, 8, 1]

I tried converting it to numpy array and then taking the means but I keep getting an error TypeError: unsupported operand type(s) for /: 'list' and 'int'. I understand that instead of using lists, I should convert it to columns, but that in my context won't be possible. Any idea on how I could accomplish this?

like image 220
Clock Slave Avatar asked Nov 22 '17 07:11

Clock Slave


2 Answers

You can convert to nested lists first and then to array and then calculate the mean:

a = np.array(df_ex.tolist())
print (a)
[[1 2 3 4 5 6 7 8]
 [2 3 4 5 6 7 8 1]]
 
# Mean of all values
print (a.mean())
4.5

# Specify row-wise mean
print (a.mean(axis=1))
[ 4.5  4.5]

# Specify column-wise mean
print (a.mean(axis=0))
[ 1.5  2.5  3.5  4.5  5.5  6.5  7.5  4.5]
like image 56
jezrael Avatar answered Oct 23 '22 22:10

jezrael


You can call on np.mean by passing nested lists and specifying an axis.

Setup

df_ex = pd.DataFrame(dict(
    col1=[[1, 2, 3, 4, 5, 6, 7, 8],
          [2, 3, 4, 5, 6, 7, 8, 1]]))

df_ex

                       col1
0  [1, 2, 3, 4, 5, 6, 7, 8]
1  [2, 3, 4, 5, 6, 7, 8, 1]

Solution

np.mean(df_ex['col1'].tolist(), axis=1)

array([ 4.5,  4.5])

Or

np.mean(df_ex['col1'].tolist(), axis=0)

array([ 1.5,  2.5,  3.5,  4.5,  5.5,  6.5,  7.5,  4.5])
like image 30
piRSquared Avatar answered Oct 23 '22 21:10

piRSquared