Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas: if the data is NaN, then change to be 0, else change to be 1 in data frame

I have a DataFrame:df as following:

 row  id  name    age   url           
  1   e1   tom    NaN   http1   
  2   e2   john   25    NaN
  3   e3   lucy   NaN  http3 
  4   e4   tick   29    NaN

I want to change the NaN to be 0, else to be 1 in the columns: age, url. My code is following, but it is wrong.

  import Pandas as pd

  df[['age', 'url']].applymap(lambda x: 0 if x=='NaN' else x)

I want to get the following result:

  row  id  name    age   url           
  1   e1   tom     0     1
  2   e2   john    1     0
  3   e3   lucy    0     1 
  4   e4   tick    1     0

Thanks for your help!

like image 569
tktktk0711 Avatar asked Jul 27 '16 08:07

tktktk0711


People also ask

How do you replace values in a DataFrame based on a condition?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do you replace all NaN values with string in pandas?

Use df. replace(np. nan,'',regex=True) method to replace all NaN values to an empty string in the Pandas DataFrame column.


1 Answers

You can use where with fillna and condition by isnull:

df[['age', 'url']] = df[['age', 'url']].where(df[['age', 'url']].isnull(), 1)
                                       .fillna(0).astype(int)
print (df)

   row  id  name  age  url
0    1  e1   tom    0    1
1    2  e2  john    1    0
2    3  e3  lucy    0    1
3    4  e4  tick    1    0

Or numpy.where with isnull:

df[['age', 'url']] = np.where(df[['age', 'url']].isnull(), 0, 1)
print (df)
   row  id  name  age  url
0    1  e1   tom    0    1
1    2  e2  john    1    0
2    3  e3  lucy    0    1
3    4  e4  tick    1    0

Fastest solution with notnull and astype:

df[['age', 'url']] = df[['age', 'url']].notnull().astype(int)
print (df)
   row  id  name  age  url
0    1  e1   tom    0    1
1    2  e2  john    1    0
2    3  e3  lucy    0    1
3    4  e4  tick    1    0

EDIT:

I try modify your solution:

df[['age', 'url']] = df[['age', 'url']].applymap(lambda x: 0 if pd.isnull(x) else 1)
print (df)
   row  id  name  age  url
0    1  e1   tom    0    1
1    2  e2  john    1    0
2    3  e3  lucy    0    1
3    4  e4  tick    1    0

Timings:

len(df)=4k:

In [127]: %timeit df[['age', 'url']] = df[['age', 'url']].applymap(lambda x: 0 if pd.isnull(x) else 1)
100 loops, best of 3: 11.2 ms per loop

In [128]: %timeit df[['age', 'url']] = np.where(df[['age', 'url']].isnull(), 0, 1)
100 loops, best of 3: 2.69 ms per loop

In [129]: %timeit df[['age', 'url']] = np.where(pd.notnull(df[['age', 'url']]), 1, 0)
100 loops, best of 3: 2.78 ms per loop

In [131]: %timeit df.loc[:, ['age', 'url']] = df[['age', 'url']].notnull() * 1
1000 loops, best of 3: 1.45 ms per loop

In [136]: %timeit df[['age', 'url']] = df[['age', 'url']].notnull().astype(int)
1000 loops, best of 3: 1.01 ms per loop
like image 133
jezrael Avatar answered Oct 11 '22 20:10

jezrael