Suppose I have the following pandas dataframe
import pandas as pd
import numpy as np
df= pd.DataFrame(np.nan, columns =["A","B","C"], index =np.arange(5))
df=df.astype(object)
for c in list(df):
for i in df.index.values:
df.at[i, c]=np.arange(5).tolist()
This results in df whose cells are numpy arrays
df
Out[16]:
A B C
0 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
1 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
2 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
3 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
4 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
I would like to calculate the mean of the dataframe but it does not work as each cell is treated as a string. For example,
type(df.loc[0][0])
Out[19]: list
Thus, if I calculate its mean, it returns nan
df["Average"]= df.mean(axis=1)
df
Out[21]:
A B C Average
0 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] NaN
1 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] NaN
2 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] NaN
3 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] NaN
4 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] NaN
My question is, how would I convert this df back to numeric values that I can work with?
You might want to restructure your dataframe as mentioned. But to work with what you have, assuming you want the mean of each element in the dataframe, you can try the applymap
method.
df.applymap(np.mean)
I think idea convert values to columns is really good, because then is possible use pandas vectorized functions:
df1 = pd.concat([pd.DataFrame(df[c].values.tolist()) for c in df.columns],
axis=1,
keys=df.columns)
df1.columns = ['{}{}'.format(i, j) for i, j in df1.columns]
print (df1)
A0 A1 A2 A3 A4 B0 B1 B2 B3 B4 C0 C1 C2 C3 C4
0 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
1 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
2 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
3 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
But if need mean
of all lists together:
df= pd.DataFrame(np.nan, columns =["A","B","C"], index =np.arange(5))
df=df.astype(object)
for c in list(df):
for i in df.index.values:
df.at[i, c]=np.arange(i+1).tolist()
print (df)
A B C
0 [0] [0] [0]
1 [0, 1] [0, 1] [0, 1]
2 [0, 1, 2] [0, 1, 2] [0, 1, 2]
3 [0, 1, 2, 3] [0, 1, 2, 3] [0, 1, 2, 3]
4 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
from itertools import chain
from statistics import mean
df['Average'] = [mean(list(chain.from_iterable(x))) for x in df.values.tolist()]
print (df)
A B C Average
0 [0] [0] [0] 0.0
1 [0, 1] [0, 1] [0, 1] 0.5
2 [0, 1, 2] [0, 1, 2] [0, 1, 2] 1.0
3 [0, 1, 2, 3] [0, 1, 2, 3] [0, 1, 2, 3] 1.5
4 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] 2.0
EDIT:
If values are strings:
df= pd.DataFrame(np.nan, columns =["A","B","C"], index =np.arange(5))
df=df.astype(object)
for c in list(df):
for i in df.index.values:
df.at[i, c]=np.arange(5).tolist()
df=df.astype(str)
print (df)
A B C
0 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
1 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
2 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
3 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
4 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
df1 = pd.concat([df[c].str.strip('[]').str.split(', ', expand=True) for c in df.columns],
axis=1,
keys=df.columns).astype(float)
df1.columns = ['{}{}'.format(i, j) for i, j in df1.columns]
df1["Average"]= df1.mean(axis=1)
print (df1)
A0 A1 A2 A3 A4 B0 B1 B2 B3 B4 C0 C1 C2 C3 C4 \
0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0
1 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0
2 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0
3 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0
4 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0
Average
0 2.0
1 2.0
2 2.0
3 2.0
4 2.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With