I am very new to Python, and I wonder what the following line of code is doing and how could it be written in R:
df['sticky'] = df[['humidity', 'workingday']].apply(lambda x: (0, 1)[x['workingday'] == 1 and x['humidity'] >= 60], axis = 1)
For instance, what is the meaning of lambda x: (0, 1)
?
P.S.
df
is a pandas
dataframe
Let's start from the lambda
. The complete expression is:
lambda x: (0, 1)[x['workingday'] == 1 and x['humidity'] >= 60]
and it's an anonymous function that takes one argument x
and returns:
1
if x['workingday'] == 1 and x['humidity'] >= 60
0
otherwisethe (0, 1)[...]
trick is used to return 0
or 1
instead of Python booleans False
and True
. It exploits the fact that False
and True
will be coerced to numerical 0
and 1
when used in place of a numeric value, e.g. as array (or tuple) index. For instance, if the expression evaluates to True
, cell 1
of the tuple is accessed, which contains 1
.
This function is mapped on every row of the (Pandas?) dataframe (actually, only on filtered columns 'humidity'
and 'workingday'
) and the result is stored in 'sticky'
column. That said, you can translate the same expression in R using an anonymous function
and apply
:
df$sticky <- apply(df[, c("workingday", "humidity")], 1, function(x) {
x["workingday"] == 1 & x["humidity"] >= 60;
});
(the filtering is probably not necessary, but my R skills are quite rusty).
However, there is a more idiomatic way of achieving the same, as kdopen wrote:
df$sticky <- df$workingday == 1 & df$humidity >= 60
The idiomatic R equivalent would be
df$sticky <- df$workingday == 1 & df$humidity >= 60
Assuming the desire is to get an indicator column.
Stefano has nicely explained the Python code. A fully expanded version of the lambda might be
def func(x):
if x['workingday'] == 1 and x['humidity'] >= 60:
return 1
else:
return 0
But you'd never write that
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With