I have a pandas dataframe:
df = pd.DataFrame({'col1': ['3 a, 3 ab, 1 b',
'4 a, 4 ab, 1 b, 1 d',
np.nan] })
and a dictionary
di = {'a': 10.0,
'ab': 2.0,
'b': 1.5,
'd': 1.0,
np.nan: 0.0}
Using values from the dictionary, I want to evaluate the dataframe rows like this:
3*10.0 + 3*2.0 + 1*1.5 giving me a final output that looks like this:
pd.DataFrame({'col1': ['3 a, 3 ab, 1 b',
'4 a, 4 ab, 1 b, 1 d',
'np.nan'], 'result': [37.5,
50.5,
0] })
So, far I could only replace ',' by '+'
df['col1'].str.replace(',',' +').str.split(' ')
Here is on way seem over kill
df['col1'].str.split(', ',expand=True).replace({' ':'*','np.nan':'0'},regex=True).\
stack().apply(lambda x : eval(x,di)).sum(level=0)
Out[884]:
0 37.5
1 50.5
2 0.0
dtype: float64
from functools import reduce
from operator import mul
def m(x): return di.get(x, x)
df.assign(result=[
sum(
reduce(mul, map(float, map(m, s.split())))
for s in row.split(', ')
) for row in df.col1
])
col1 result
0 3 a, 3 ab, 1 b 37.5
1 4 a, 4 ab, 1 b, 1 d 50.5
2 np.nan 0.0
We first explode
your string to rows seperated by a comma, using this function.
Then we split
the values by a whitespace (' '
) to seperate columns.
Finally we map
your dictionary to the letters and do a groupby.sum
:
new = explode_str(df.dropna(), 'col1', ',')['col1'].str.strip().str.split(' ', expand=True).append(df[df['col1'].isna()])
s = new[1].map(di) * pd.to_numeric(new[0])
df['result'] = s.groupby(s.index).sum()
Output
col1 result
0 3 a, 3 ab, 1 b 37.5
1 4 a, 4 ab, 1 b, 1 d 50.5
2 NaN 0.0
Function used from linked answer:
def explode_str(df, col, sep):
s = df[col]
i = np.arange(len(s)).repeat(s.str.count(sep) + 1)
return df.iloc[i].assign(**{col: sep.join(s).split(sep)})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With