import pandas as pd
import numpy as np
pb = {"mark_up_id":{"0":"123","1":"456","2":"789","3":"111","4":"222"},"mark_up":{"0":1.2987,"1":1.5625,"2":1.3698,"3":1.3333,"4":1.4589}}
data = {"id":{"0":"K69","1":"K70","2":"K71","3":"K72","4":"K73","5":"K74","6":"K75","7":"K79","8":"K86","9":"K100"},"cost":{"0":29.74,"1":9.42,"2":9.42,"3":9.42,"4":9.48,"5":9.48,"6":24.36,"7":5.16,"8":9.8,"9":3.28},"mark_up_id":{"0":"123","1":"456","2":"789","3":"111","4":"222","5":"333","6":"444","7":"555","8":"666","9":"777"}}
pb = pd.DataFrame(data=pb).set_index('mark_up_id')
df = pd.DataFrame(data=data)
I know that I can use something like:
df['mark_up_id'].map(pb['mark_up'])
to perform a v-look-up. I'd like to take the mark-up this returns and multiply it by each cost with a common index to yield a new column called price.
I know I can merge the two and then run the calculation. That's how I produced the desired output. I'd like to be able to do this similar to how you'd loop through a dictionary and use the keys to find values in another dictionary and perform some kind of computation inside of a loop. Considering PANDAS dataframes sit on top of dictionaries, there must be a way of using a combination of join/map/apply to do this without actually joining the two data-sets in memory.
Desired output:
desired_output = {"cost":{"0":29.74,"1":9.42,"2":9.42,"3":9.42,"4":9.48},"id":{"0":"K69","1":"K70","2":"K71","3":"K72","4":"K73"},"mark_up_id":{"0":"123","1":"456","2":"111","3":"123","4":"789"},"price":{"0":38.623338,"1":14.71875,"2":12.559686,"3":12.233754,"4":12.985704}}
do = pd.DataFrame(data=desired_output)
Bonus Points:
Explain the difference between the accepted answer and...
pb.loc[df['mark_up_id']]['mark_up'] * df.set_index('mark_up_id')['cost']
and why the following lambda function that i derived the above from hits an error...
df.apply(lambda x : x['cost']*pb.loc[x['mark_up_id']],axis=1 )
returns an error saying:
KeyError: ('the label [333] is not in the [index]', u'occurred at index 5')
Try
df['price'] = df['mark_up_id'].map(pb['mark_up']) * df['cost']
You get
cost id mark_up_id price
0 29.74 K69 123 38.623338
1 9.42 K70 456 14.718750
2 9.42 K71 111 12.559686
3 9.42 K72 123 12.233754
4 9.48 K73 789 12.985704
UPDATE: for updated question:
In [79]: df = df.assign(price=df['mark_up_id'].map(pb['mark_up']) * df['cost']).dropna()
In [80]: df
Out[80]:
cost id mark_up_id price
0 29.74 K69 123 38.623338
1 9.42 K70 456 14.718750
2 9.42 K71 789 12.903516
3 9.42 K72 111 12.559686
4 9.48 K73 222 13.830372
Old answer:
In [67]: df = df.assign(price=df['mark_up_id'].map(pb['mark_up']) * df['cost'])
In [68]: df
Out[68]:
cost id mark_up_id price
0 29.74 K69 123 38.623338
1 9.42 K70 456 14.718750
2 9.42 K71 111 12.559686
3 9.42 K72 123 12.233754
4 9.48 K73 789 12.985704
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With