Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set non-null values of DataFrame conditionally

Tags:

python

pandas

I have a dataframe:

     0         1         2   3         4  y
35 NaN       NaN       NaN NaN  0.342153  0
40 NaN  0.326323       NaN NaN       NaN  0
43 NaN       NaN  0.290126 NaN       NaN  0
49 NaN  0.326323       NaN NaN       NaN  0
50 NaN  0.391147       NaN NaN       NaN  1

And code to produce it:

import pandas as pd
import numpy as np

nan = np.nan

df = pd.DataFrame(
{0L: {35: nan, 40: nan, 43: nan, 49: nan, 50: nan},
 1L: {35: nan,
  40: 0.32632316859446198,
  43: nan,
  49: 0.32632316859446198,
  50: 0.39114724480578139},
 2L: {35: nan, 40: nan, 43: 0.29012581014105987, 49: nan, 50: nan},
 3L: {35: nan, 40: nan, 43: nan, 49: nan, 50: nan},
 4L: {35: 0.34215328467153283, 40: nan, 43: nan, 49: nan, 50: nan},
 'y': {35: 0, 40: 0, 43: 0, 49: 0, 50: 1}})

I need to assign a value to each column using the following pseudocode:

column = 1 if column > threshold else 0 where column != NaN

I have tried using fancy indexing to accomplish this like so:

df.ix[df[1].notnull(),1] = 1; df

     0   1         2   3         4  y
35 NaN NaN       NaN NaN  0.342153  0
40 NaN   1       NaN NaN       NaN  0
43 NaN NaN  0.290126 NaN       NaN  0
49 NaN   1       NaN NaN       NaN  0
50 NaN   1       NaN NaN       NaN  1

But A) I'm not sure how to apply the conditional logic and B) I have to apply the logic to each column iteratively rather than to the dataframe as a whole.

Question:

How can I apply conditional logic to the non-null values of a dataframe, preserving the nullity of the other fields?

like image 797
Zelazny7 Avatar asked Mar 13 '13 17:03

Zelazny7


2 Answers

# you need this because your y column is an int64 (otherwise this the next step
# will throw an exception), on the to fix list in 0.11-dev though
In [71]: df = orig_df.astype('float64')

# use boolean indexing!
# NaN are automatically excluded
In [72]: df[df>0.3] = 1 ; df[df<=0.3] = 0

In [73]: df
Out[73]: 
     0   1         2   3   4  y
35 NaN NaN       NaN NaN   1  0
40 NaN   1       NaN NaN NaN  0
43 NaN NaN         0 NaN NaN  0
49 NaN   1       NaN NaN NaN  0
50 NaN   1       NaN NaN NaN  1
like image 69
Jeff Avatar answered Oct 01 '22 15:10

Jeff


You could use applymap, since it seems like you really want an elementwise operation:

>>> df.applymap(lambda x: x if pd.isnull(x) else (1 if x > 0.3 else 0))
     0   1   2   3   4  y
35 NaN NaN NaN NaN   1  0
40 NaN   1 NaN NaN NaN  0
43 NaN NaN   0 NaN NaN  0
49 NaN   1 NaN NaN NaN  0
50 NaN   1 NaN NaN NaN  1

Although in this particular case we could cheat (twice):

>>> (df > 0.3) * 1 + df * 0
     0   1   2   3   4  y
35 NaN NaN NaN NaN   1  0
40 NaN   1 NaN NaN NaN  0
43 NaN NaN   0 NaN NaN  0
49 NaN   1 NaN NaN NaN  0
50 NaN   1 NaN NaN NaN  1
like image 29
DSM Avatar answered Oct 01 '22 15:10

DSM