Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

np.where multiple return values

Using pandas and numpy I am trying to process a column in a dataframe, and want to create a new column with values relating to it. So if in column x the value 1 is present, in the new column it would be a, for value 2 it would be b etc

I can do this for single conditions, i.e

df['new_col'] = np.where(df['col_1'] == 1, a, n/a)

And I can find example of multiple conditions i.e if x = 3 or x = 4 the value should a, but not to do something like if x = 3 the value should be a and if x = 4 the value be c.

I tried simply running two lines of code such as :

df['new_col'] = np.where(df['col_1'] == 1, a, n/a)
df['new_col'] = np.where(df['col_1'] == 2, b, n/a)

But obviously the second line overwrites. Am I missing something crucial?

like image 545
DGraham Avatar asked Mar 01 '16 14:03

DGraham


2 Answers

I think you can use loc:

df.loc[(df['col_1'] == 1, 'new_col')] = a
df.loc[(df['col_1'] == 2, 'new_col')] = b

Or:

df['new_col'] = np.where(df['col_1'] == 1, a, np.where(df['col_1'] == 2, b, np.nan))

Or numpy.select:

df['new_col'] = np.select([df['col_1'] == 1, df['col_1'] == 2],[a, b], default=np.nan)

Or use Series.map, if no match get NaN by default:

d =  { 0 : 'a',  1 : 'b'}

df['new_col'] = df['col_1'].map(d)
like image 148
jezrael Avatar answered Oct 14 '22 13:10

jezrael


I think numpy choose() is the best option for you.

import numpy as np
choices = 'abcde'
N = 10
np.random.seed(0)
data = np.random.randint(1, len(choices) + 1, size=N)
print(data)
print(np.choose(data - 1, choices))

Output:

[5 1 4 4 4 2 4 3 5 1]
['e' 'a' 'd' 'd' 'd' 'b' 'd' 'c' 'e' 'a']
like image 36
Stop harming Monica Avatar answered Oct 14 '22 14:10

Stop harming Monica