I have many columns in my dataset & i need to change values in some of the variables. I do as below
import pandas as pd
import numpy as np
df = pd.DataFrame({'one':['a' , 'b']*5, 'two':['c' , 'd']*5, 'three':['a' , 'd']*5})
select
df1 = df[['one', 'two']]
dict
map = { 'a' : 'd', 'b' : 'c', 'c' : 'b', 'd' : 'a'}
and loop
df2=[]
for i in df1.values:
np = [ map[x] for x in i]
df2.append(np)
then i change columns
df['one'] = [row[0] for row in df2]
df['two'] = [row[1] for row in df2]
It works but it's very long way. How to make it shorter?
You can use Series.map() iterating over columns:
cols = ['one', 'two']
mapd = { 'a' : 'd', 'b' : 'c', 'c' : 'b', 'd' : 'a'}
for col in cols:
df[col] = df[col].map(mapd).fillna(df[col])
df
Out:
one three two
0 d a b
1 c d a
2 d a b
3 c d a
4 d a b
5 c d a
6 d a b
7 c d a
8 d a b
9 c d a
Timings:
df = pd.DataFrame({'one':['a' , 'b']*5000000,
'two':['c' , 'd']*5000000,
'three':['a' , 'd']*5000000})
%%timeit
for col in cols:
df[col].map(mapd).fillna(df[col])
1 loop, best of 3: 1.71 s per loop
%%timeit
for col in cols:
... colSet = set(df[col].values);
... colMap = {k:v for k,v in mapd.items() if k in colSet}
... df.replace(to_replace={col:colMap})
1 loop, best of 3: 3.35 s per loop
%timeit df[cols].stack().map(mapd).unstack()
1 loop, best of 3: 9.18 s per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With