I have a dataset with three columns. The column user
has two actions including action1
and action2
. action2
only contains information if action1
column has A
data. I want to concatenate P
data in action1
with the previous data in action2
. For example, if action2
has ac
, and the next row has P
in action1
, I want to change P
into Pac
, and will continue (all P
turn into Pac
) until action2
change. Take note that this process should repeat for each user
.
df<-read.table(text="
user action1 action2
1 A a
1 B NA
1 P NA
1 P NA
1 A ac
1 P NA
2 B NA
2 P NA
2 A aa
2 P NA
2 AB aa",header=T)
result: (I highlighted those rows that infected)
user action1 action2
1 A a
1 B NA
1 Pa NA <-
1 Pa NA <-
1 A ac
1 Pac NA <-
2 B NA
2 P NA
2 A aa
2 Paa NA <-
2 AB NA
Thank you
library('data.table')
library('zoo')
# Using zoo::na.locf(), fill NA with the previous value and group by user. Also `na.locf` will not remove NA.
setDT(df)[, V3 := na.locf(action2, na.rm = FALSE), by = .(user)]
# combine action1 with V3 column if action1 is equal to 'P' and it is not NA.
df[action1 == 'P' & !(is.na(V3)), action1 := paste0(action1, V3)]
df[, V3 := NULL] # remove V3 column
df
# user action1 action2
# 1: 1 A a
# 2: 1 B NA
# 3: 1 Pa NA
# 4: 1 Pa NA
# 5: 1 A ac
# 6: 1 Pac NA
# 7: 2 B NA
# 8: 2 P NA
# 9: 2 A aa
# 10: 2 Paa NA
# 11: 2 AB aa
Data:
df<-read.table(text="
user action1 action2
1 A a
1 B NA
1 P NA
1 P NA
1 A ac
1 P NA
2 B NA
2 P NA
2 A aa
2 P NA
2 AB aa",header=T, stringsAsFactors = FALSE)
Here's an option with the tidyverse
library(dplyr)
library(tidyr)
# for each user carry valid action2 results forward
df1 <- df %>%
group_by(user) %>%
tidyr::fill(action2)
# replace missing values so they can be pasted
df1$action2 <- ifelse(is.na(df1$action2), "", df1$action2)
# add valid action2 results to action1 where action1 is "P"
df1$new <- ifelse(df1$action1 == "P",
paste0("P", df1$action2),
df1$action1)
# A tibble: 11 x 4
# Groups: user [2]
user action1 action2 new
<int> <chr> <chr> <chr>
1 1 A "a" A
2 1 B "a" B
3 1 P "a" Pa
4 1 P "a" Pa
5 1 A "ac" A
6 1 P "ac" Pac
7 2 B "" B
8 2 P "" P
9 2 A "aa" A
10 2 P "aa" Paa
11 2 AB "aa" AB
stringAsFactors = FALSE
was added to the read in data:
df<-read.table(text="
user action1 action2
1 A a
1 B NA
1 P NA
1 P NA
1 A ac
1 P NA
2 B NA
2 P NA
2 A aa
2 P NA
2 AB aa",header=T, stringsAsFactors = FALSE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With