To contextualize. I'm an R heavy user, but currently switching between python (with pandas). Let's say I have this data frame
data = {'participant': ['p1','p1','p2','p3'],
'metadata': ['congruent_1','congruent_2','incongruent_1','incongruent_2'],
'reaction': [22000,25000,27000,35000]
}
df_s1 = pd.DataFrame(data, columns = ['participant','metadata', 'reaction'])
df_s1 = df_s1.append([df_s1]*15,ignore_index=True)
df_s1
and I want to reproduce what I can easily do in R (pipe functions), by:
df_s1[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1")].df_s1["reaction"].mean()
This is not possible. I just can success when I split this code into parts/variables:
x = df_s1[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1")]
x = x["reaction"].mean()
x
In dplyr way, I'd go with
ds_s1 %>%
filter(metadata == "congruent_1" | metadata == "incongruent_1") %>%
summarise(mean(reaction))
Note: I highly appreciate concise references to a site in which I could transpose my R code to Python. Several literature is available, but with mixed formats and flexible styles.
Thanks
We have .loc
here
df_s1.loc[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1"), 'reaction'].mean()
Out[117]: 24500.0
Change to isin
as Quang mentioned try to reduce the line of code
In base R
mean(ds_s1$reaction[ds_s1$metadata%in%c('congruent_1','incongruent_1')])
Do you mean:
df_s1.loc[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1"), "reaction"].mean()
Or simpler with isin
:
df_s1.loc[df_s1.metadata.isin(["congruent_1", "incongruent_1"]), "reaction"].mean()
Out:
24500.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With