Pivot rows into columns with values of counts for each measurement R

Question

I have a sample dataframe that I am working with

ID <- c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B")
TARG_AVG <- c(2.1,2.1,2.1,2.1,2.1,2.1,2.3,2.3,2.5,2.5,2.5,2.5,3.1,3.1,3.1,3.1,3.3,3.3,3.3,3.3,3.5,3.5)
Measurement <- c("Len","Len","Len","Wid","Ht","Ht","Dep","Brt","Ht","Ht","Dep","Dep"
                 ,"Dep","Dep","Len","Len","Ht","Ht","Brt","Brt","Wid","Wid")
df1 <- data.frame(ID,TARG_AVG,Measurement)

I am trying to solve 3 different problems here

1) I want to get the summary of how many unique measurements are there for (ID & TARG_AVG) grouping. I currently do this

unique <- summaryBy(Measurement~ID+TARG_AVG, data=df1, FUN=function(x) { c(Count=length(x)) } )

This gives me the total (measurement.count) but I want the counts for each measurements too. My desired output is

  ID TARG_AVG Len Wid Ht Dep Brt Measurement.Count
1  A      2.1   3   1  2   0   0                 6
2  A      2.3   0   0  0   1   1                 2
3  A      2.5   0   0  2   2   0                 4
4  B      3.1   2   0  0   2   0                 4
5  B      3.3   0   0  2   0   2                 4
6  B      3.5   0   2  0   0   0                 2

2) Once I get the above output, I would like to subset the rows so that I get a filtered output that returns rows that have at least a count of 2 measurements > 2 . Here my desired output would be

  ID TARG_AVG Len Wid Ht Dep Brt Measurement.Count
1  A      2.1   3   1  2   0   0                 6
3  A      2.5   0   0  2   2   0                 4
4  B      3.1   2   0  0   2   0                 4
5  B      3.3   0   0  2   0   2                 4

3) Finally, I would like to pivot back the columns into rows with only measurements > 2. My desired output here would be

      ID TARG_AVG Measurement
    1  A      2.1   Len   
    2  A      2.1   Len   
    3  A      2.1   Len   
    4  A      2.1   Ht   
    5  A      2.1   Ht   
    6  A      2.5   Ht   
    7  A      2.5   Ht   
    8  A      2.5   Dep  
    9  A      2.5   Dep  
   10  B      3.1   Len  
   11  B      3.1   Len  
   12  B      3.1   Dep 
   13  B      3.1   Dep
   14  B      3.3   Ht 
   15  B      3.3   Ht 
   16  B      3.3   Brt 
   17  B      3.3   Brt

I am learning reshape2, dplyr & data.table packages at the moment and would be very useful if someone would help me solve this by pointing me in the right direction.

David Arenburg · Accepted Answer

Newest solution

library(data.table) #v 1.9.6+
setDT(df1)[, indx := .N, by = names(df1)
           ][indx > 1, if(uniqueN(Measurement) > 1) .SD, by = .(ID, TARG_AVG)]
#     ID TARG_AVG Measurement indx
#  1:  A      2.1         Len    3
#  2:  A      2.1         Len    3
#  3:  A      2.1         Len    3
#  4:  A      2.1          Ht    2
#  5:  A      2.1          Ht    2
#  6:  A      2.5          Ht    2
#  7:  A      2.5          Ht    2
#  8:  A      2.5         Dep    2
#  9:  A      2.5         Dep    2
# 10:  B      3.1         Dep    2
# 11:  B      3.1         Dep    2
# 12:  B      3.1         Len    2
# 13:  B      3.1         Len    2
# 14:  B      3.3          Ht    2
# 15:  B      3.3          Ht    2
# 16:  B      3.3         Brt    2
# 17:  B      3.3         Brt    2

Or the dplyr equivalent

df1 %>%
  group_by(ID, TARG_AVG, Measurement) %>%
  filter(n() > 1) %>%
  group_by(ID, TARG_AVG) %>%
  filter(n_distinct(Measurement) > 1)

Older solution

library(data.table)
## dcast the data (no need in total)
res <- dcast(df1, ID + TARG_AVG  ~ Measurement)
## filter by at least 2 incidents of at least length 2
res <- res[rowSums(res[-(1:2)] > 1) > 1,]
## melt the data back and filter again by at least 2 incidents
res <- melt(setDT(res), id = 1:2)[value > 1]
## Expand the data back
res[, .SD[rep(.I, value)]]

The solution to the original question

Here's a possible solution using reshape2

1st step

library(reshape2)
res <- dcast(df1, ID + TARG_AVG  ~ Measurement, margins = "Measurement")

2nd step

res <- res[res$"(all)" > 2,]

3d step

library(data.table)
setDT(df1)[, if(.N > 2) .SD, by = .(ID, TARG_AVG)]

Pivot rows into columns with values of counts for each measurement R

Tags:

r

data.table

dplyr

plyr

reshape2

Sharath

1 Answers

David Arenburg

Recent Activity

Donate For Us

Pivot rows into columns with values of counts for each measurement R

Tags:

r

data.table

dplyr

plyr

reshape2

Sharath

1 Answers

David Arenburg

Related questions

Recent Activity

Donate For Us