Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dictionary keys to replace strings in pandas dataframe column with dictionary values and perform evaluate

Tags:

python

pandas

I have a pandas dataframe:

df = pd.DataFrame({'col1': ['3 a, 3 ab, 1 b',
                            '4 a, 4 ab, 1 b, 1 d',
                            np.nan] })

and a dictionary

di = {'a': 10.0,
 'ab': 2.0,
    'b': 1.5,
    'd': 1.0,
    np.nan: 0.0}

Using values from the dictionary, I want to evaluate the dataframe rows like this:

3*10.0 + 3*2.0 + 1*1.5 giving me a final output that looks like this:

pd.DataFrame({'col1': ['3 a, 3 ab, 1 b',
                            '4 a, 4 ab, 1 b, 1 d',
                            'np.nan'], 'result': [37.5,
                            50.5,
                            0]  })

So, far I could only replace ',' by '+'

df['col1'].str.replace(',',' +').str.split(' ')
like image 386
user0000 Avatar asked Jul 18 '19 15:07

user0000


3 Answers

Here is on way seem over kill

df['col1'].str.split(', ',expand=True).replace({' ':'*','np.nan':'0'},regex=True).\
     stack().apply(lambda x : eval(x,di)).sum(level=0)
Out[884]: 
0    37.5
1    50.5
2     0.0
dtype: float64
like image 192
BENY Avatar answered Oct 02 '22 15:10

BENY


comprehension

from functools import reduce
from operator import mul

def m(x): return di.get(x, x)

df.assign(result=[
    sum(
        reduce(mul, map(float, map(m, s.split())))
        for s in row.split(', ')
    ) for row in df.col1
])

                  col1  result
0       3 a, 3 ab, 1 b    37.5
1  4 a, 4 ab, 1 b, 1 d    50.5
2               np.nan     0.0
like image 27
piRSquared Avatar answered Oct 02 '22 16:10

piRSquared


  1. We first explode your string to rows seperated by a comma, using this function.

  2. Then we split the values by a whitespace (' ') to seperate columns.

  3. Finally we map your dictionary to the letters and do a groupby.sum:

new  = explode_str(df.dropna(), 'col1', ',')['col1'].str.strip().str.split(' ', expand=True).append(df[df['col1'].isna()])

s = new[1].map(di) * pd.to_numeric(new[0])

df['result'] = s.groupby(s.index).sum()

Output

                  col1  result
0       3 a, 3 ab, 1 b    37.5
1  4 a, 4 ab, 1 b, 1 d    50.5
2                  NaN     0.0

Function used from linked answer:

def explode_str(df, col, sep):
    s = df[col]
    i = np.arange(len(s)).repeat(s.str.count(sep) + 1)
    return df.iloc[i].assign(**{col: sep.join(s).split(sep)})
like image 42
Erfan Avatar answered Oct 02 '22 16:10

Erfan