I am very new to Python, and I wonder what the following line of code is doing and how could it be written in R:
df['sticky'] = df[['humidity', 'workingday']].apply(lambda x: (0, 1)[x['workingday'] == 1 and x['humidity'] >= 60], axis = 1)
For instance, what is the meaning of lambda x: (0, 1)?
P.S.
df is a pandas dataframe
Let's start from the lambda. The complete expression is:
lambda x: (0, 1)[x['workingday'] == 1 and x['humidity'] >= 60]
and it's an anonymous function that takes one argument x and returns:
1 if x['workingday'] == 1 and x['humidity'] >= 60
0 otherwisethe (0, 1)[...] trick is used to return 0 or 1 instead of Python booleans False and True. It exploits the fact that False and True will be coerced to numerical 0 and 1 when used in place of a numeric value, e.g. as array (or tuple) index. For instance, if the expression evaluates to True, cell 1 of the tuple is accessed, which contains 1.
This function is mapped on every row of the (Pandas?) dataframe (actually, only on filtered columns 'humidity' and 'workingday') and the result is stored in 'sticky' column. That said, you can translate the same expression in R using an anonymous function and apply:
df$sticky <- apply(df[, c("workingday", "humidity")], 1, function(x) {
x["workingday"] == 1 & x["humidity"] >= 60;
});
(the filtering is probably not necessary, but my R skills are quite rusty).
However, there is a more idiomatic way of achieving the same, as kdopen wrote:
df$sticky <- df$workingday == 1 & df$humidity >= 60
The idiomatic R equivalent would be
df$sticky <- df$workingday == 1 & df$humidity >= 60
Assuming the desire is to get an indicator column.
Stefano has nicely explained the Python code. A fully expanded version of the lambda might be
def func(x):
if x['workingday'] == 1 and x['humidity'] >= 60:
return 1
else:
return 0
But you'd never write that
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With