I have a sample dataframe that I am working with
ID <- c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B")
TARG_AVG <- c(2.1,2.1,2.1,2.1,2.1,2.1,2.3,2.3,2.5,2.5,2.5,2.5,3.1,3.1,3.1,3.1,3.3,3.3,3.3,3.3,3.5,3.5)
Measurement <- c("Len","Len","Len","Wid","Ht","Ht","Dep","Brt","Ht","Ht","Dep","Dep"
,"Dep","Dep","Len","Len","Ht","Ht","Brt","Brt","Wid","Wid")
df1 <- data.frame(ID,TARG_AVG,Measurement)
I am trying to solve 3 different problems here
1) I want to get the summary of how many unique measurements are there for (ID & TARG_AVG) grouping. I currently do this
unique <- summaryBy(Measurement~ID+TARG_AVG, data=df1, FUN=function(x) { c(Count=length(x)) } )
This gives me the total (measurement.count) but I want the counts for each measurements too. My desired output is
ID TARG_AVG Len Wid Ht Dep Brt Measurement.Count
1 A 2.1 3 1 2 0 0 6
2 A 2.3 0 0 0 1 1 2
3 A 2.5 0 0 2 2 0 4
4 B 3.1 2 0 0 2 0 4
5 B 3.3 0 0 2 0 2 4
6 B 3.5 0 2 0 0 0 2
2) Once I get the above output, I would like to subset the rows so that I get a filtered output that returns rows that have at least a count of 2 measurements > 2 . Here my desired output would be
ID TARG_AVG Len Wid Ht Dep Brt Measurement.Count
1 A 2.1 3 1 2 0 0 6
3 A 2.5 0 0 2 2 0 4
4 B 3.1 2 0 0 2 0 4
5 B 3.3 0 0 2 0 2 4
3) Finally, I would like to pivot back the columns into rows with only measurements > 2. My desired output here would be
ID TARG_AVG Measurement
1 A 2.1 Len
2 A 2.1 Len
3 A 2.1 Len
4 A 2.1 Ht
5 A 2.1 Ht
6 A 2.5 Ht
7 A 2.5 Ht
8 A 2.5 Dep
9 A 2.5 Dep
10 B 3.1 Len
11 B 3.1 Len
12 B 3.1 Dep
13 B 3.1 Dep
14 B 3.3 Ht
15 B 3.3 Ht
16 B 3.3 Brt
17 B 3.3 Brt
I am learning reshape2, dplyr & data.table packages at the moment and would be very useful if someone would help me solve this by pointing me in the right direction.
Newest solution
library(data.table) #v 1.9.6+
setDT(df1)[, indx := .N, by = names(df1)
][indx > 1, if(uniqueN(Measurement) > 1) .SD, by = .(ID, TARG_AVG)]
# ID TARG_AVG Measurement indx
# 1: A 2.1 Len 3
# 2: A 2.1 Len 3
# 3: A 2.1 Len 3
# 4: A 2.1 Ht 2
# 5: A 2.1 Ht 2
# 6: A 2.5 Ht 2
# 7: A 2.5 Ht 2
# 8: A 2.5 Dep 2
# 9: A 2.5 Dep 2
# 10: B 3.1 Dep 2
# 11: B 3.1 Dep 2
# 12: B 3.1 Len 2
# 13: B 3.1 Len 2
# 14: B 3.3 Ht 2
# 15: B 3.3 Ht 2
# 16: B 3.3 Brt 2
# 17: B 3.3 Brt 2
Or the dplyr
equivalent
df1 %>%
group_by(ID, TARG_AVG, Measurement) %>%
filter(n() > 1) %>%
group_by(ID, TARG_AVG) %>%
filter(n_distinct(Measurement) > 1)
Older solution
library(data.table)
## dcast the data (no need in total)
res <- dcast(df1, ID + TARG_AVG ~ Measurement)
## filter by at least 2 incidents of at least length 2
res <- res[rowSums(res[-(1:2)] > 1) > 1,]
## melt the data back and filter again by at least 2 incidents
res <- melt(setDT(res), id = 1:2)[value > 1]
## Expand the data back
res[, .SD[rep(.I, value)]]
The solution to the original question
Here's a possible solution using reshape2
1st step
library(reshape2)
res <- dcast(df1, ID + TARG_AVG ~ Measurement, margins = "Measurement")
2nd step
res <- res[res$"(all)" > 2,]
3d step
library(data.table)
setDT(df1)[, if(.N > 2) .SD, by = .(ID, TARG_AVG)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With