Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pivot rows into columns with values of counts for each measurement R

I have a sample dataframe that I am working with

ID <- c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B")
TARG_AVG <- c(2.1,2.1,2.1,2.1,2.1,2.1,2.3,2.3,2.5,2.5,2.5,2.5,3.1,3.1,3.1,3.1,3.3,3.3,3.3,3.3,3.5,3.5)
Measurement <- c("Len","Len","Len","Wid","Ht","Ht","Dep","Brt","Ht","Ht","Dep","Dep"
                 ,"Dep","Dep","Len","Len","Ht","Ht","Brt","Brt","Wid","Wid")
df1 <- data.frame(ID,TARG_AVG,Measurement)

I am trying to solve 3 different problems here

1) I want to get the summary of how many unique measurements are there for (ID & TARG_AVG) grouping. I currently do this

unique <- summaryBy(Measurement~ID+TARG_AVG, data=df1, FUN=function(x) { c(Count=length(x)) } ) 

This gives me the total (measurement.count) but I want the counts for each measurements too. My desired output is

  ID TARG_AVG Len Wid Ht Dep Brt Measurement.Count
1  A      2.1   3   1  2   0   0                 6
2  A      2.3   0   0  0   1   1                 2
3  A      2.5   0   0  2   2   0                 4
4  B      3.1   2   0  0   2   0                 4
5  B      3.3   0   0  2   0   2                 4
6  B      3.5   0   2  0   0   0                 2

2) Once I get the above output, I would like to subset the rows so that I get a filtered output that returns rows that have at least a count of 2 measurements > 2 . Here my desired output would be

  ID TARG_AVG Len Wid Ht Dep Brt Measurement.Count
1  A      2.1   3   1  2   0   0                 6
3  A      2.5   0   0  2   2   0                 4
4  B      3.1   2   0  0   2   0                 4
5  B      3.3   0   0  2   0   2                 4

3) Finally, I would like to pivot back the columns into rows with only measurements > 2. My desired output here would be

      ID TARG_AVG Measurement
    1  A      2.1   Len   
    2  A      2.1   Len   
    3  A      2.1   Len   
    4  A      2.1   Ht   
    5  A      2.1   Ht   
    6  A      2.5   Ht   
    7  A      2.5   Ht   
    8  A      2.5   Dep  
    9  A      2.5   Dep  
   10  B      3.1   Len  
   11  B      3.1   Len  
   12  B      3.1   Dep 
   13  B      3.1   Dep
   14  B      3.3   Ht 
   15  B      3.3   Ht 
   16  B      3.3   Brt 
   17  B      3.3   Brt 

I am learning reshape2, dplyr & data.table packages at the moment and would be very useful if someone would help me solve this by pointing me in the right direction.

like image 984
Sharath Avatar asked Mar 14 '23 13:03

Sharath


1 Answers

Newest solution

library(data.table) #v 1.9.6+
setDT(df1)[, indx := .N, by = names(df1)
           ][indx > 1, if(uniqueN(Measurement) > 1) .SD, by = .(ID, TARG_AVG)]
#     ID TARG_AVG Measurement indx
#  1:  A      2.1         Len    3
#  2:  A      2.1         Len    3
#  3:  A      2.1         Len    3
#  4:  A      2.1          Ht    2
#  5:  A      2.1          Ht    2
#  6:  A      2.5          Ht    2
#  7:  A      2.5          Ht    2
#  8:  A      2.5         Dep    2
#  9:  A      2.5         Dep    2
# 10:  B      3.1         Dep    2
# 11:  B      3.1         Dep    2
# 12:  B      3.1         Len    2
# 13:  B      3.1         Len    2
# 14:  B      3.3          Ht    2
# 15:  B      3.3          Ht    2
# 16:  B      3.3         Brt    2
# 17:  B      3.3         Brt    2

Or the dplyr equivalent

df1 %>%
  group_by(ID, TARG_AVG, Measurement) %>%
  filter(n() > 1) %>%
  group_by(ID, TARG_AVG) %>%
  filter(n_distinct(Measurement) > 1)

Older solution

library(data.table)
## dcast the data (no need in total)
res <- dcast(df1, ID + TARG_AVG  ~ Measurement)
## filter by at least 2 incidents of at least length 2
res <- res[rowSums(res[-(1:2)] > 1) > 1,]
## melt the data back and filter again by at least 2 incidents
res <- melt(setDT(res), id = 1:2)[value > 1]
## Expand the data back
res[, .SD[rep(.I, value)]]

The solution to the original question

Here's a possible solution using reshape2

1st step

library(reshape2)
res <- dcast(df1, ID + TARG_AVG  ~ Measurement, margins = "Measurement")

2nd step

res <- res[res$"(all)" > 2,]

3d step

library(data.table)
setDT(df1)[, if(.N > 2) .SD, by = .(ID, TARG_AVG)]
like image 73
David Arenburg Avatar answered Mar 17 '23 15:03

David Arenburg