I'm an R programmer trying to get into Python. In R, when I want to mutate a column conditionally, I use:
col = dplyr::mutate(col, ifelse(condition, if_true(x), if_false(x))
In Python, how does one mutate a column value conditionally? Here's my minimally reproducible example:
def act(cntnt):
def do_thing(cntnt):
return(cntnt + "has it")
def do_other_thing(cntnt):
return(cntnt + "nope")
has_abc = cntnt.str.contains.contains("abc")
if has_abc == T:
cntnt[has_abc].apply(do_thing)
else:
cntnt[has_abc].apply(do_other_thing)
I think what you're looking for is assign
, which is essentially the pandas equivalent to mutate
in dplyr
. Your conditional statement can be written with a list comprehension, or using vectorized methods (see below).
Take an example dataframe, lets call it df
:
> df
a
1 0.50212013
2 1.01959213
3 -1.32490344
4 -0.82133375
5 0.23010548
6 -0.64410737
7 -0.46565442
8 -0.08943858
9 0.11489957
10 -0.21628132
R
/ dplyr
:In R
, you can use mutate
with ifelse
to make a column based on a condition (in this example, it will be 'pos'
when column a is greater than 0
):
df = dplyr::mutate(df, col = ifelse(df$a > 0, 'pos', 'neg'))
And the resulting df
:
> df
a col
1 0.50212013 pos
2 1.01959213 pos
3 -1.32490344 neg
4 -0.82133375 neg
5 0.23010548 pos
6 -0.64410737 neg
7 -0.46565442 neg
8 -0.08943858 neg
9 0.11489957 pos
10 -0.21628132 neg
Python
/ Pandas
In pandas
, use assign
with a list comprehension:
df = df.assign(col = ['pos' if a > 0 else 'neg' for a in df['a']])
The resulting df
:
>>> df
a col
0 0.502120 pos
1 1.019592 pos
2 -1.324903 neg
3 -0.821334 neg
4 0.230105 pos
5 -0.644107 neg
6 -0.465654 neg
7 -0.089439 neg
8 0.114900 pos
9 -0.216281 neg
The ifelse
you were using in R
is replaced by a list comprehension.
You don't have to use assign
: you can create a new column directly on the df
without creating a copy if you want:
df['col'] = ['pos' if a > 0 else 'neg' for a in df['a']]
Also, instead of a list comprehension, you could use one of numpy
's vectorized methods for conditional statements, for example, np.select
:
import numpy as np
df['col'] = np.select([df['a'] > 0], ['pos'], 'neg')
# or
df = df.assign(col = np.select([df['a'] > 0], ['pos'], 'neg'))
You can use the condition (and its negation) for logical indexing:
has_abc = cntnt.str.contains("abc")
cntnt[ has_abc].apply(do_thing)
cntnt[~has_abc].apply(do_other_thing)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With