Calculate a mean of pandas dataframe whose cells are list

Question

Suppose I have the following pandas dataframe

import pandas as pd
import numpy as np
df= pd.DataFrame(np.nan, columns =["A","B","C"], index =np.arange(5))
df=df.astype(object)
for c in list(df):
    for i in df.index.values:
        df.at[i, c]=np.arange(5).tolist()

This results in df whose cells are numpy arrays

df
Out[16]: 
                 A                B                C
0  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]
1  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]
2  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]
3  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]
4  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]

I would like to calculate the mean of the dataframe but it does not work as each cell is treated as a string. For example,

type(df.loc[0][0])
Out[19]: list

Thus, if I calculate its mean, it returns nan

df["Average"]= df.mean(axis=1)

df
Out[21]: 
                 A                B                C  Average
0  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]      NaN
1  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]      NaN
2  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]      NaN
3  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]      NaN
4  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]      NaN

My question is, how would I convert this df back to numeric values that I can work with?

busybear · Accepted Answer

You might want to restructure your dataframe as mentioned. But to work with what you have, assuming you want the mean of each element in the dataframe, you can try the applymap method.

df.applymap(np.mean)

jezrael · Answer

I think idea convert values to columns is really good, because then is possible use pandas vectorized functions:

df1 = pd.concat([pd.DataFrame(df[c].values.tolist()) for c in df.columns], 
                 axis=1, 
                 keys=df.columns)
df1.columns = ['{}{}'.format(i, j) for i, j in df1.columns]
print (df1)
   A0  A1  A2  A3  A4  B0  B1  B2  B3  B4  C0  C1  C2  C3  C4
0   0   1   2   3   4   0   1   2   3   4   0   1   2   3   4
1   0   1   2   3   4   0   1   2   3   4   0   1   2   3   4
2   0   1   2   3   4   0   1   2   3   4   0   1   2   3   4
3   0   1   2   3   4   0   1   2   3   4   0   1   2   3   4
4   0   1   2   3   4   0   1   2   3   4   0   1   2   3   4

But if need mean of all lists together:

df= pd.DataFrame(np.nan, columns =["A","B","C"], index =np.arange(5))
df=df.astype(object)
for c in list(df):
    for i in df.index.values:
        df.at[i, c]=np.arange(i+1).tolist()
print (df)
                 A                B                C
0              [0]              [0]              [0]
1           [0, 1]           [0, 1]           [0, 1]
2        [0, 1, 2]        [0, 1, 2]        [0, 1, 2]
3     [0, 1, 2, 3]     [0, 1, 2, 3]     [0, 1, 2, 3]
4  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]

from itertools import chain
from statistics import mean
df['Average'] = [mean(list(chain.from_iterable(x))) for x in df.values.tolist()]
print (df)
                 A                B                C  Average
0              [0]              [0]              [0]      0.0
1           [0, 1]           [0, 1]           [0, 1]      0.5
2        [0, 1, 2]        [0, 1, 2]        [0, 1, 2]      1.0
3     [0, 1, 2, 3]     [0, 1, 2, 3]     [0, 1, 2, 3]      1.5
4  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]      2.0

EDIT:

If values are strings:

df= pd.DataFrame(np.nan, columns =["A","B","C"], index =np.arange(5))
df=df.astype(object)
for c in list(df):
    for i in df.index.values:
        df.at[i, c]=np.arange(5).tolist()

df=df.astype(str)
print (df)
                 A                B                C
0  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]
1  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]
2  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]
3  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]
4  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]

df1 = pd.concat([df[c].str.strip('[]').str.split(', ', expand=True) for c in df.columns], 
                 axis=1, 
                 keys=df.columns).astype(float)
df1.columns = ['{}{}'.format(i, j) for i, j in df1.columns]
df1["Average"]= df1.mean(axis=1)
print (df1)
    A0   A1   A2   A3   A4   B0   B1   B2   B3   B4   C0   C1   C2   C3   C4  \
0  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0   
1  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0   
2  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0   
3  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0   
4  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0   

   Average  
0      2.0  
1      2.0  
2      2.0  
3      2.0  
4      2.0

Calculate a mean of pandas dataframe whose cells are list

Tags:

python

pandas

numpy

Liam deBoeuf

2 Answers

busybear

jezrael

Recent Activity

Donate For Us

Calculate a mean of pandas dataframe whose cells are list

Tags:

python

pandas

numpy

Liam deBoeuf

2 Answers

busybear

jezrael

Related questions

Recent Activity

Donate For Us