Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: adjust values for multiple occurrences of a case

Tags:

r

I have a question that will hopefully not be a huge obstacle for advanced R users...

test.data <- data.frame(case = c(1, 1, 1, 2, 2, 2, 3),
                        year = c(2006, 2007, 2008, 2007, 2006, 2008, 2006),
                        level = c(10, 20, 20, 12, 20, 20, 20))

As you might be able to see, I have multiple occurrences for each case, distinguished by the year. The values of level differ within a case, and I would like to correct that by setting each value of level to the minimum level of a given case. In this example, each value of level for case=1 should be 10, and each value of level for case=2 should be 12. For any specific case I could do the following:

test.data$level[test.data$case==1] <- min(test.data$level[test.data$case==1])

But since I have several hundred cases, this would take quite long. Therefore, I would like to ask if you have a quicker solution.

like image 954
Fabian Avatar asked Jun 29 '15 08:06

Fabian


3 Answers

You can try

 library(data.table)
 setDT(test.data)[, level:= min(level, na.rm=TRUE), case]
 #    case year level
 #1:    1 2006    10
 #2:    1 2007    10
 #3:    1 2008    10
 #4:    2 2007    12
 #5:    2 2006    12
 #6:    2 2008    12
 #7:    3 2006    20

Or using dplyr

 library(dplyr)
 test.data %>% 
        group_by(case) %>% 
        mutate(level= min(level, na.rm=TRUE))
 #   case year level
 #1    1 2006    10
 #2    1 2007    10
 #3    1 2008    10
 #4    2 2007    12
 #5    2 2006    12
 #6    2 2008    12
 #7    3 2006    20

Or using sqldf/dplyr

  library(sqldf)
  library(dplyr)
  sqldf('select * from "test.data"
            left join(select "case", 
              min(level) as Level
              from "test.data" 
              group by "case")
            using ("case")') %>%
                         select(-level)
  #   case year Level
  #1    1 2006    10
  #2    1 2007    10
  #3    1 2008    10
  #4    2 2007    12
  #5    2 2006    12
  #6    2 2008    12
  #7    3 2006    20

Or a modification suggested by @G.Grothendieck using only sqldf

  sqldf('select "case", year, "min(level)" as Level 
            from "test.data" 
               left join(select "case", min(level)
                         from "test.data" 
                         group by "case") 
                     using ("case")')

  #1    1 2006    10
  #2    1 2007    10
  #3    1 2008    10
  #4    2 2007    12
  #5    2 2006    12
  #6    2 2008    12
  #7    3 2006    20

Or using base R

 test.data$level <- with(test.data, ave(level, case, FUN=min))
like image 172
akrun Avatar answered Nov 09 '22 07:11

akrun


Here's a classic using base R functions.

# may not be optimal for larger datasets due to merge
min.lvl <- aggregate(level ~ case, data = test.data, FUN = min)
merge(x = test.data, y = min.lvl, by = "case", all.x = TRUE, sort = FALSE)

  case year level.x level.y
1    1 2006      10      10
2    1 2007      20      10
3    1 2008      20      10
4    2 2007      12      12
5    2 2006      20      12
6    2 2008      20      12
7    3 2006      20      20

Second vanilla option of doing things would be

new.data <- by(data = test.data, INDICES = test.data$case, FUN = function(x) {
  x$level <- min(x$level)
  x
})

do.call("rbind", new.data)

    case year level
1.1    1 2006    10
1.2    1 2007    10
1.3    1 2008    10
2.4    2 2007    12
2.5    2 2006    12
2.6    2 2008    12
3      3 2006    20
like image 32
Roman Luštrik Avatar answered Nov 09 '22 08:11

Roman Luštrik


Alternative using doBy

library(doBy)
summaryBy(level ~ case, id=~ year, test.data, 
          full.dimension=TRUE,  keep.names=TRUE, min)

#   case level year
#1:    1    10 2006
#2:    1    10 2006
#3:    1    10 2006
#4:    2    12 2007
#5:    2    12 2007
#6:    2    12 2007
#7:    3    20 2006

Or to be more compact

library(plyr)
ddply(test.data, .(case), mutate, level = min(level))

#  case year level
#1    1 2006    10
#2    1 2007    10
#3    1 2008    10
#4    2 2007    12
#5    2 2006    12
#6    2 2008    12
#7    3 2006    20

Another base R method using lapply

do.call(rbind,lapply(split(test.data, test.data$case), 
              function(x){x$level = min(x$level); x}))

#   case year level
#1:    1 2006    10
#2:    1 2007    10
#3:    1 2008    10
#4:    2 2007    12
#5:    2 2006    12
#6:    2 2008    12
#7:    3 2006    20
like image 3
Veerendra Gadekar Avatar answered Nov 09 '22 07:11

Veerendra Gadekar