Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assigning values in a sequence depending on previous row value in R

Tags:

dataframe

r

I have asked a similar question like this here and the solution mentioned there works fine with problem stated there but this one is little trickier and harder version of that.

I have a data table like this.

   ID1 member
 1   a parent
 2   a  child
 3   a parent
 4   a  child
 5   a  child
 6   b parent
 7   b parent
 8   b  child
 9   c  child
10   c  child
11   c parent
12   c  child

And I want to assign a sequence like below keeping in mind ID1 and member column.

   ID1 member sequence
 1   a parent        1
 2   a  child        2
 3   a parent        1
 4   a  child        2
 5   a  child        3
 6   b parent        1
 7   b parent        1
 8   b  child        2
 9   c  child        2 *
10   c  child        3
11   c parent        1
12   c  child        2

i.e.

> dt$sequence = 1, wherever dt$member == "parent"

> dt$sequence = previous_row_value + 1, wherever dt$member=="child"

But sometimes it can happen that new ID1 might not start with a member="parent". If it starts with "child" (as in example with star-marked row) we have to start sequencing with 2. As of now I have been doing it using loops, like below.

dt_sequence <- dt[ ,sequencing(.SD), by="ID1"]

sequencing <- function(dt){
  for(i in 1:nrow(dt)){
    if(i == 1){
      if(dt[i,member] %in% "child")
        dt$sequence[i] = 2
      else
        dt$sequence[i] = 1
    }
    else{
      if(dt[i,member] %in% "child")
        dt$sequence[i] = as.numeric(dt$sequence[i-1]) + 1
      else
        dt$sequence[i] = 1
    }
  }
  return(dt)
}

I ran this code on a data table of 4e5 rows and it took a lot of time to complete (around 20 mins). Can anyone suggest a faster way to do it.

like image 391
shubham Avatar asked Dec 08 '14 11:12

shubham


2 Answers

DF <- read.table(text="   ID1 member
 1   a parent
 2   a  child
 3   a parent
 4   a  child
 5   a  child
 6   b parent
 7   b parent
 8   b  child
 9   c  child
10   c  child
11   c parent
12   c  child", header=TRUE, stringsAsFactors=FALSE)

library(data.table)
setDT(DF)
DF[, sequence := seq_along(member) + (member[1] == "child"), 
   by = list(ID1, cumsum(member == "parent"))]

#    ID1 member sequence
# 1:   a parent        1
# 2:   a  child        2
# 3:   a parent        1
# 4:   a  child        2
# 5:   a  child        3
# 6:   b parent        1
# 7:   b parent        1
# 8:   b  child        2
# 9:   c  child        2
#10:   c  child        3
#11:   c parent        1
#12:   c  child        2
like image 109
Roland Avatar answered Oct 20 '22 14:10

Roland


Try this,

dt$sequence <- rep(NA, length(dt$member))
for (i in seq_along(dt$member)){
  dt$sequence[i] <- ifelse(dt$member[i]=="parent", 1, 
                           ifelse(dt$ID1[i]==dt$ID1[i-1], dt$sequence[i-1] + 1, 2)
                           )
   }

and easier dplyr solution

data <- dt %>% 
  group_by(ID1) %>% 
  mutate(
    seq = ifelse(member=="parent", 1, 2),
    sequence = ifelse(seq==1, 1, lag(seq, default = 1) + 1)
  ) 

If each group ID1 contains at least one parent, much easier solution will be arranging the data within group=ID1 so that parent always comes on the top:

dt %>% 
  group_by(ID1) %>%
  arrange(desc(member))
like image 25
Khashaa Avatar answered Oct 20 '22 15:10

Khashaa