I have a dataframe that looks like following:
df <- data.frame(Site=rep(paste0('site', 1:5), 50),
Month=sample(1:12, 50, replace=T),
Count=(sample(1:1000, 50, replace=T)))
I want to remove any sites where the count is always <5% of max monthly count across all sites.
The max monthly counts across all sites are:
library(plyr)
ddply(df, .(Month), summarise, Max.Count=max(Count))
If a count of 1 is assigned to site5, then its counts are always <5% of max monthly counts across all sites. Therefore I would want site5 removed.
df$Count[df$Site=='site5'] <- 1
However, after assigning new values to site2, some of its counts are <5% of max monthly counts, while others are >5%. Therefore I would not want site2 removed.
df$Count[df$Site=='site2'] <- ceiling(seq(1, 1000, length.out=20))
How can I subset dataframe to remove any sites where counts are always <5% of max monthly count? Let me know if question unclear and I will amend.
A data.table
solution:
require(data.table)
set.seed(45)
df <- data.frame(Site=rep(paste0('site', 1:5), 50),
Month=sample(1:12, 50, replace=T),
Count=(sample(1:1000, 50, replace=T)))
df$Count[df$Site=='site5'] <- 1
dt <- data.table(df, key=c("Month", "Site"))
# set max.count per site+month
dt[, max.count := max(Count), by = list(Month)]
# get the site that is TRUE for all months it is present
d1 <- dt[, list(check = all(Count < .05 * max.count)), by = list(Month, Site)]
sites <- as.character(d1[, all(check == TRUE), by=Site][V1 == TRUE, Site])
dt.out <- dt[Site != sites][, max.count := NULL]
# Site Month Count
# 1: site1 1 939
# 2: site1 1 939
# 3: site1 1 939
# 4: site1 1 939
# 5: site1 1 939
# ---
# 196: site2 12 969
# 197: site2 12 684
# 198: site2 12 613
# 199: site2 12 969
# 200: site2 12 684
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With