Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - conditionally select source column of data for a new column based on row value

Tags:

Is there a pandas function that allows selection from different columns based on a condition? This is analogous to a CASE statement in a SQL Select clause. For example, say I have the following DataFrame:

foo = DataFrame(
    [['USA',1,2],
    ['Canada',3,4],
    ['Canada',5,6]], 
    columns = ('Country', 'x', 'y')
)

I want to select from column 'x' when Country=='USA', and from column 'y' when Country=='Canada', resulting in something like the following:

  Country  x  y  z
0     USA  1  2  1
1  Canada  3  4  4
2  Canada  5  6  6

[3 rows x 4 columns]
like image 321
aensm Avatar asked May 29 '14 13:05

aensm


People also ask

How do I get the value of a column based on another column value?

You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression.

How do I add a conditional column in pandas?

You can create a conditional column in pandas DataFrame by using np. where() , np. select() , DataFrame. map() , DataFrame.


2 Answers

Using DataFrame.where's other argument and pandas.concat:

>>> import pandas as pd
>>>
>>> foo = pd.DataFrame([
...     ['USA',1,2],
...     ['Canada',3,4],
...     ['Canada',5,6]
... ], columns=('Country', 'x', 'y'))
>>>
>>> z = foo['x'].where(foo['Country'] == 'USA', foo['y'])
>>> pd.concat([foo['Country'], z], axis=1)
  Country  x
0     USA  1
1  Canada  4
2  Canada  6

If you want z as column name, specify keys:

>>> pd.concat([foo['Country'], z], keys=['Country', 'z'], axis=1)
  Country  z
0     USA  1
1  Canada  4
2  Canada  6
like image 139
falsetru Avatar answered Sep 20 '22 15:09

falsetru


This would work:

In [84]:

def func(x):
    if x['Country'] == 'USA':
        return x['x']
    if x['Country'] == 'Canada':
        return x['y']
    return NaN
foo['z'] = foo.apply(func(row), axis = 1)
foo
Out[84]:
  Country  x  y  z
0     USA  1  2  1
1  Canada  3  4  4
2  Canada  5  6  6

[3 rows x 4 columns]

You can use loc:

In [137]:

foo.loc[foo['Country']=='Canada','z'] = foo['y']
foo.loc[foo['Country']=='USA','z'] = foo['x']
foo
Out[137]:
  Country  x  y  z
0     USA  1  2  1
1  Canada  3  4  4
2  Canada  5  6  6

[3 rows x 4 columns]

EDIT

Although unwieldy using loc will scale better with larger dataframes as the apply here is called for every row whilst using boolean indexing will be vectorised.

like image 43
EdChum Avatar answered Sep 18 '22 15:09

EdChum