I have a data frame set up like the following:
df <- data.frame("id" = c(111,111,111,222,222,222,222,333,333,333,333),
"Location" = c("A","B","A","A","C","B","A","B","A","A","A"),
"Encounter" = c(1,2,3,1,2,3,4,1,2,3,4))
id Location Encounter
1 111 A 1
2 111 B 2
3 111 A 3
4 222 A 1
5 222 C 2
6 222 B 3
7 222 A 4
8 333 B 1
9 333 A 2
10 333 B 3
11 333 A 4
I'm basically trying to create a binary flag a location is in a previous Encounter for each id group. So it would look like:
id Location Encounter Flag
1 111 A 1 0
2 111 B 2 0
3 111 A 3 1
4 222 A 1 0
5 222 C 2 0
6 222 B 3 0
7 222 A 4 1
8 333 B 1 0
9 333 A 2 0
10 333 B 3 1
11 333 A 4 1
I was trying to figure out how to do an if statement like:
library(dplyr)
df$Flag <- case_when((df$id - lag(df$id)) == 0 ~
case_when(df$Location == lag(df$Location, 1) |
df$Location == lag(df$Location, 2) |
df$Location == lag(df$Location, 3) ~ 1, T ~ 0), T ~ 0)
id Location Flag
1 111 A 0
2 111 B 0
3 111 A 1
4 222 A 0
5 222 C 0
6 222 B 0
7 222 A 1
8 333 B 0
9 333 A 1
10 333 B 1
11 333 A 1
But this has the issue where Row 9 is getting incorrectly assigned a 1, and there are cases with 15+ encounters in the actual data so this becomes pretty cumbersome. I was hoping to find a way to do something like
lag(df$Location, 1:df$Encounter)
But I know lag()
needs an integer for k, so that specific command wouldn't work.
It allows for creating a new column according to the following rules or criteria: The values that do not fit the condition are replaced with the given value As an example, we can create a new column based on the price column. If the price is higher than 1.4 million, the new column takes the value “class1”.
The first method is the where function of Pandas. It allows for creating a new column according to the following rules or criteria: The values that fit the condition remain the same; The values that do not fit the condition are replaced with the given value; As an example, we can create a new column based on the price column.
As an example, we can create a new column based on the price column. If the price is higher than 1.4 million, the new column takes the value “class1”. Otherwise, it takes the same value as in the price column. melb ["new1"] = melb.Price.where (melb.Price < 1400000, "class1") melb.head ()
Let’s suppose we want to create a new column called colF that will be created based on the values of the column colC using the categorise () method defined below: All you need to do is to pass the above method to apply () as a lambda expression: For more simple operations, you can specify the lambda expressions directly to the apply () method.
An option with duplicated
library(dplyr)
df %>%
group_by(id) %>%
mutate(Flag = +(duplicated(Location)))
# A tibble: 11 x 4
# Groups: id [3]
# id Location Encounter Flag
# <dbl> <fct> <dbl> <int>
# 1 111 A 1 0
# 2 111 B 2 0
# 3 111 A 3 1
# 4 222 A 1 0
# 5 222 C 2 0
# 6 222 B 3 0
# 7 222 A 4 1
# 8 333 B 1 0
# 9 333 A 2 0
#10 333 A 3 1
#11 333 A 4 1
In base R, we can use ave
grouped by id
and Location
and turn all the values from second row of the group to 1.
df$Flag <- as.integer(with(df, ave(Encounter, id, Location, FUN = seq_along) > 1))
df
# id Location Encounter Flag
#1 111 A 1 0
#2 111 B 2 0
#3 111 A 3 1
#4 222 A 1 0
#5 222 C 2 0
#6 222 B 3 0
#7 222 A 4 1
#8 333 B 1 0
#9 333 A 2 0
#10 333 A 3 1
#11 333 A 4 1
Using dplyr
, that would be
library(dplyr)
df %>% group_by(id, Location) %>% mutate(Flag = as.integer(row_number() > 1))
A more generic data.table
solution would be using .N
or rowid
:
library(data.table)
setDT(dt)[, Flag := +(rowid(id, Location)>1)][]
or
setDT(df)[, Flag := +(seq_len(.N)>1), .(id, Location)][]
#> id Location Encounter Flag
#> 1: 111 A 1 0
#> 2: 111 B 2 0
#> 3: 111 A 3 1
#> 4: 222 A 1 0
#> 5: 222 C 2 0
#> 6: 222 B 3 0
#> 7: 222 A 4 1
#> 8: 333 B 1 0
#> 9: 333 A 2 0
#> 10: 333 A 3 1
#> 11: 333 A 4 1
Using data.table
:
library(data.table)
dt[, flag:=1]
dt[, flag:=cumsum(flag), by=.(id,Location)]
dt[, flag:=ifelse(flag>1,1,0)]
Data:
dt <- data.table("id" = c(111,111,111,222,222,222,222,333,333,333,333),
"Location" = c("A","B","A","A","C","B","A","B","A","A","A"),
"Encounter" = c(1,2,3,1,2,3,4,1,2,3,4))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With