Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a new column conditionally based on previous n rows

I have a data frame set up like the following:

 df <- data.frame("id" = c(111,111,111,222,222,222,222,333,333,333,333), 
                  "Location" = c("A","B","A","A","C","B","A","B","A","A","A"), 
                  "Encounter" = c(1,2,3,1,2,3,4,1,2,3,4))

      id Location Encounter
1  111        A         1
2  111        B         2
3  111        A         3
4  222        A         1
5  222        C         2
6  222        B         3
7  222        A         4
8  333        B         1
9  333        A         2
10 333        B         3
11 333        A         4

I'm basically trying to create a binary flag a location is in a previous Encounter for each id group. So it would look like:

    id Location Encounter Flag
1  111        A         1    0
2  111        B         2    0
3  111        A         3    1
4  222        A         1    0
5  222        C         2    0
6  222        B         3    0
7  222        A         4    1
8  333        B         1    0
9  333        A         2    0
10 333        B         3    1
11 333        A         4    1

I was trying to figure out how to do an if statement like:

library(dplyr)

df$Flag <- case_when((df$id - lag(df$id)) == 0 ~ 
                case_when(df$Location == lag(df$Location, 1) | 
                          df$Location == lag(df$Location, 2) | 
                          df$Location == lag(df$Location, 3) ~ 1, T ~ 0), T ~ 0)

    id Location Flag
1  111        A    0
2  111        B    0
3  111        A    1
4  222        A    0
5  222        C    0
6  222        B    0
7  222        A    1
8  333        B    0
9  333        A    1
10 333        B    1
11 333        A    1

But this has the issue where Row 9 is getting incorrectly assigned a 1, and there are cases with 15+ encounters in the actual data so this becomes pretty cumbersome. I was hoping to find a way to do something like

lag(df$Location, 1:df$Encounter)

But I know lag() needs an integer for k, so that specific command wouldn't work.

like image 248
Dalton K Avatar asked Nov 20 '19 22:11

Dalton K


People also ask

How to create a new column based on a criteria?

It allows for creating a new column according to the following rules or criteria: The values that do not fit the condition are replaced with the given value As an example, we can create a new column based on the price column. If the price is higher than 1.4 million, the new column takes the value “class1”.

How to create a new column based on criteria in pandas?

The first method is the where function of Pandas. It allows for creating a new column according to the following rules or criteria: The values that fit the condition remain the same; The values that do not fit the condition are replaced with the given value; As an example, we can create a new column based on the price column.

How to create a new column based on the price column?

As an example, we can create a new column based on the price column. If the price is higher than 1.4 million, the new column takes the value “class1”. Otherwise, it takes the same value as in the price column. melb ["new1"] = melb.Price.where (melb.Price < 1400000, "class1") melb.head ()

How to create a column based on the values of CoLC?

Let’s suppose we want to create a new column called colF that will be created based on the values of the column colC using the categorise () method defined below: All you need to do is to pass the above method to apply () as a lambda expression: For more simple operations, you can specify the lambda expressions directly to the apply () method.


4 Answers

An option with duplicated

library(dplyr)
df %>% 
  group_by(id) %>% 
  mutate(Flag = +(duplicated(Location)))
# A tibble: 11 x 4
# Groups:   id [3]
#      id Location Encounter  Flag
#   <dbl> <fct>        <dbl> <int>
# 1   111 A                1     0
# 2   111 B                2     0
# 3   111 A                3     1
# 4   222 A                1     0
# 5   222 C                2     0
# 6   222 B                3     0
# 7   222 A                4     1
# 8   333 B                1     0
# 9   333 A                2     0
#10   333 A                3     1
#11   333 A                4     1
like image 162
akrun Avatar answered Oct 08 '22 19:10

akrun


In base R, we can use ave grouped by id and Location and turn all the values from second row of the group to 1.

df$Flag <- as.integer(with(df, ave(Encounter, id, Location, FUN = seq_along) > 1))
df

#    id Location Encounter Flag
#1  111        A         1    0
#2  111        B         2    0
#3  111        A         3    1
#4  222        A         1    0
#5  222        C         2    0
#6  222        B         3    0
#7  222        A         4    1
#8  333        B         1    0
#9  333        A         2    0
#10 333        A         3    1
#11 333        A         4    1

Using dplyr, that would be

library(dplyr)

df %>%  group_by(id, Location) %>%  mutate(Flag = as.integer(row_number() > 1))
like image 43
Ronak Shah Avatar answered Oct 08 '22 19:10

Ronak Shah


A more generic data.table solution would be using .N or rowid:

library(data.table)

setDT(dt)[, Flag := +(rowid(id, Location)>1)][]

or

setDT(df)[, Flag := +(seq_len(.N)>1), .(id, Location)][]
#>      id Location  Encounter Flag
#> 1:  111        A         1    0
#> 2:  111        B         2    0
#> 3:  111        A         3    1
#> 4:  222        A         1    0
#> 5:  222        C         2    0
#> 6:  222        B         3    0
#> 7:  222        A         4    1
#> 8:  333        B         1    0
#> 9:  333        A         2    0
#> 10: 333        A         3    1
#> 11: 333        A         4    1
like image 25
M-- Avatar answered Oct 08 '22 21:10

M--


Using data.table:

library(data.table)

dt[, flag:=1]
dt[, flag:=cumsum(flag), by=.(id,Location)]
dt[, flag:=ifelse(flag>1,1,0)]

Data:

dt <- data.table("id" = c(111,111,111,222,222,222,222,333,333,333,333), 
                 "Location" = c("A","B","A","A","C","B","A","B","A","A","A"),
                 "Encounter" = c(1,2,3,1,2,3,4,1,2,3,4))
like image 28
LocoGris Avatar answered Oct 08 '22 20:10

LocoGris