I'm currently studying pandas and I come from an R/dplyr/tidyverse background.
Pandas has a not-so-intuitive API and how would I elegantly rewrite such operation from dplyr using pandas syntax?
library("nycflights13")
library("tidyverse")
delays <- flights %>%
group_by(dest) %>%
summarize(
count = n(),
dist = mean(distance, na.rm = TRUE),
delay = mean(arr_delay, na.rm = TRUE)
) %>%
filter(count > 20, dest != "HNL")
pd.DataFrame.agg method doesn't allow much flexibility for changing columns' names in the method itself
That's not exactly true. You could actually rename the columns inside agg similar to in R although it is a better idea to not use count as a column name as it is also an attribute:
delays = (
flights
.groupby('dest', as_index=False)
.agg(
count=('year', 'count'),
dist=('distance', 'mean'),
delay=('arr_delay', 'mean'))
.query('count > 20 & dest != "HNL"')
.reset_index(drop=True)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With