Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dict in loop for pd.DataFrame

I have many columns in my dataset & i need to change values in some of the variables. I do as below

import pandas as pd
import numpy as np
df = pd.DataFrame({'one':['a' , 'b']*5, 'two':['c' , 'd']*5, 'three':['a' , 'd']*5})

select

df1 = df[['one', 'two']]

dict

map = { 'a' : 'd', 'b' : 'c', 'c' : 'b', 'd' : 'a'}

and loop

df2=[]
for i in df1.values:
    np = [ map[x] for x in i]
    df2.append(np)

then i change columns

df['one'] = [row[0] for row in df2]
df['two'] = [row[1] for row in df2]

It works but it's very long way. How to make it shorter?

like image 618
Edward Avatar asked Jun 09 '26 02:06

Edward


1 Answers

You can use Series.map() iterating over columns:

cols = ['one', 'two']
mapd = { 'a' : 'd', 'b' : 'c', 'c' : 'b', 'd' : 'a'}

for col in cols:
    df[col] = df[col].map(mapd).fillna(df[col])


df
Out: 
  one three two
0   d     a   b
1   c     d   a
2   d     a   b
3   c     d   a
4   d     a   b
5   c     d   a
6   d     a   b
7   c     d   a
8   d     a   b
9   c     d   a

Timings:

df = pd.DataFrame({'one':['a' , 'b']*5000000, 
                   'two':['c' , 'd']*5000000, 
                   'three':['a' , 'd']*5000000})

%%timeit
for col in cols:
    df[col].map(mapd).fillna(df[col])
1 loop, best of 3: 1.71 s per loop

%%timeit
for col in cols:
...  colSet = set(df[col].values);
...  colMap = {k:v for k,v in mapd.items() if k in colSet}
...  df.replace(to_replace={col:colMap})
1 loop, best of 3: 3.35 s per loop


%timeit df[cols].stack().map(mapd).unstack()
1 loop, best of 3: 9.18 s per loop