Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas replace with default value

I have a pandas dataframe I want to replace a certain column conditionally.

eg:

   col

 0 Mr
 1 Miss
 2 Mr
 3 Mrs
 4 Col.

I want to map them as

{'Mr': 0, 'Mrs': 1, 'Miss': 2}

If there are other titles now available in the dict then I want them to have a default value of 3

The above example becomes

   col

 0 0
 1 2
 2 0
 3 1
 4 3

Can I do this with pandas.replace() without using regex ?

like image 330
Vikash Balasubramanian Avatar asked Aug 23 '16 15:08

Vikash Balasubramanian


People also ask

How do you replace a specific value in a pandas DataFrame?

DataFrame. replace() function is used to replace values in column (one value with another value on all columns). This method takes to_replace, value, inplace, limit, regex and method as parameters and returns a new DataFrame. When inplace=True is used, it replaces on existing DataFrame object and returns None value.

How do you replace a value in a whole data frame?

Pandas DataFrame replace() Method The replace() method replaces the specified value with another specified value. The replace() method searches the entire DataFrame and replaces every case of the specified value.

How do you replace values in a DataFrame based on a condition?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.


1 Answers

You can use map rather as replace, because faster, then fillna by 3 and cast to int by astype:

df['col'] = df.col.map({'Mr': 0, 'Mrs': 1, 'Miss': 2}).fillna(3).astype(int)

print (df)
   col
0    0
1    2
2    0
3    1
4    3

Another solution with numpy.where and condition with isin:

d = {'Mr': 0, 'Mrs': 1, 'Miss': 2}
df['col'] = np.where(df.col.isin(d.keys()), df.col.map(d), 3).astype(int)
print (df)
   col
0    0
1    2
2    0
3    1
4    3

Solution with replace:

d = {'Mr': 0, 'Mrs': 1, 'Miss': 2}
df['col'] = np.where(df.col.isin(d.keys()), df.col.replace(d), 3)
print (df)
   col
0    0
1    2
2    0
3    1
4    3

Timings:

df = pd.concat([df]*10000).reset_index(drop=True)

d = {'Mr': 0, 'Mrs': 1, 'Miss': 2}
df['col0'] = df.col.map(d).fillna(3).astype(int)
df['col1'] = np.where(df.col.isin(d.keys()), df.col.replace(d), 3)
df['col2'] = np.where(df.col.isin(d.keys()), df.col.map(d), 3).astype(int)
print (df)

In [447]: %timeit df['col0'] = df.col.map(d).fillna(3).astype(int)
100 loops, best of 3: 4.93 ms per loop

In [448]: %timeit df['col1'] = np.where(df.col.isin(d.keys()), df.col.replace(d), 3)
100 loops, best of 3: 14.3 ms per loop

In [449]: %timeit df['col2'] = np.where(df.col.isin(d.keys()), df.col.map(d), 3).astype(int)
100 loops, best of 3: 7.68 ms per loop

In [450]: %timeit df['col3'] = df.col.map(lambda L: d.get(L, 3))
10 loops, best of 3: 36.2 ms per loop
like image 126
jezrael Avatar answered Sep 21 '22 09:09

jezrael