I have asked a similar question like this here and the solution mentioned there works fine with problem stated there but this one is little trickier and harder version of that.
I have a data table like this.
ID1 member
1 a parent
2 a child
3 a parent
4 a child
5 a child
6 b parent
7 b parent
8 b child
9 c child
10 c child
11 c parent
12 c child
And I want to assign a sequence like below keeping in mind ID1 and member column.
ID1 member sequence
1 a parent 1
2 a child 2
3 a parent 1
4 a child 2
5 a child 3
6 b parent 1
7 b parent 1
8 b child 2
9 c child 2 *
10 c child 3
11 c parent 1
12 c child 2
i.e.
> dt$sequence = 1, wherever dt$member == "parent"
> dt$sequence = previous_row_value + 1, wherever dt$member=="child"
But sometimes it can happen that new ID1 might not start with a member="parent". If it starts with "child" (as in example with star-marked row) we have to start sequencing with 2. As of now I have been doing it using loops, like below.
dt_sequence <- dt[ ,sequencing(.SD), by="ID1"]
sequencing <- function(dt){
for(i in 1:nrow(dt)){
if(i == 1){
if(dt[i,member] %in% "child")
dt$sequence[i] = 2
else
dt$sequence[i] = 1
}
else{
if(dt[i,member] %in% "child")
dt$sequence[i] = as.numeric(dt$sequence[i-1]) + 1
else
dt$sequence[i] = 1
}
}
return(dt)
}
I ran this code on a data table of 4e5 rows and it took a lot of time to complete (around 20 mins). Can anyone suggest a faster way to do it.
DF <- read.table(text=" ID1 member
1 a parent
2 a child
3 a parent
4 a child
5 a child
6 b parent
7 b parent
8 b child
9 c child
10 c child
11 c parent
12 c child", header=TRUE, stringsAsFactors=FALSE)
library(data.table)
setDT(DF)
DF[, sequence := seq_along(member) + (member[1] == "child"),
by = list(ID1, cumsum(member == "parent"))]
# ID1 member sequence
# 1: a parent 1
# 2: a child 2
# 3: a parent 1
# 4: a child 2
# 5: a child 3
# 6: b parent 1
# 7: b parent 1
# 8: b child 2
# 9: c child 2
#10: c child 3
#11: c parent 1
#12: c child 2
Try this,
dt$sequence <- rep(NA, length(dt$member))
for (i in seq_along(dt$member)){
dt$sequence[i] <- ifelse(dt$member[i]=="parent", 1,
ifelse(dt$ID1[i]==dt$ID1[i-1], dt$sequence[i-1] + 1, 2)
)
}
and easier dplyr solution
data <- dt %>%
group_by(ID1) %>%
mutate(
seq = ifelse(member=="parent", 1, 2),
sequence = ifelse(seq==1, 1, lag(seq, default = 1) + 1)
)
If each group ID1
contains at least one parent
, much easier solution will be arranging the data within group=ID1 so that parent
always comes on the top:
dt %>%
group_by(ID1) %>%
arrange(desc(member))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With