My dataset contains multiple observations for different species. Each species has a different number of observations. Looking for a fast way in R to calculate the mean of the top 10% of values for a given variable for each species.
I figured out how to get a given number of values (i.e., the top 20 values).
clim6 <-setDT(range)[order(species, clim6),.SD[1:20],by=species]
write.csv(Bioclimlo6, file = "clim6.csv")
I also know that there is a way to trim the dataset to generate a mean of the remaining dataset but I'm not sure how to trim only the bottom 90%.
mean(x, trim = 0, na.rm = FALSE)
Mean of top 10% of values, using base R:
x = c(1:100,NA)
mean(x[x>=quantile(x, 0.9, na.rm=TRUE)], na.rm=TRUE)
Mean of top 10% of values, by grouping variable:
# Fake data
dat = data.frame(x=1:100, group=rep(LETTERS[1:3], c(30,30,40)))
With dplyr
library(dplyr)
dat %>% group_by(group) %>%
summarise(meanTop10pct = mean(x[x>=quantile(x, 0.9)]))
group meanTop10pct (fctr) (dbl) 1 A 29.0 2 B 59.0 3 C 98.5
With data.table
library(data.table)
setDT(dat)[, list(meanTop10pct = mean(x[x>=quantile(x, 0.9)])), by=group]
group meanTop10pct 1: A 29.0 2: B 59.0 3: C 98.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With