I am trying to create a path sequence. The following is a sample dataset:
df <- structure(list(
sess_id = c(4, 4, 4, 4, 4, 4, 4, 7, 7, 7, 7, 7),
Page = c("A", "B", "C", "D", "A", "C", "B", "B", "C", "D", "A", "D")),
.Names = c("sess_id", "Page"),
row.names = c(NA, -12L),
class = "data.frame")
This is the table:
sess_id | Page |
---|---|
4 | A |
4 | B |
4 | C |
4 | D |
4 | A |
4 | C |
4 | B |
7 | B |
7 | C |
7 | D |
7 | A |
7 | D |
I would like to add three columns like so:
sess_id | Page | Path | Start | End |
---|---|---|---|---|
4 | A | |||
4 | B | AB | A | B |
4 | C | ABC | A | C |
4 | D | ABCD | A | D |
4 | A | ABCDA | A | A |
4 | C | BCDAC | B | C |
4 | B | CDACB | C | B |
7 | B | |||
7 | C | BC | B | C |
7 | D | BCD | B | D |
7 | A | BCDA | B | A |
7 | D | BCDAD | B | D |
I am trying to create a path sequence of five pages in each session. And map the start and end of that five-page sequence.
If you don't want to keep elements in Path
as a string, you can use
df %>%
group_by(sess_id) %>%
mutate(Path = lapply(accumulate(Page, c), last, 5)) %>%
ungroup() %>%
mutate(
Start = Page[pmax(1, (1:n()) - 4)],
End = Page
)
which gives
# A tibble: 12 × 5
sess_id Page Path Start End
<dbl> <chr> <list> <chr> <chr>
1 4 A <chr [1]> A A
2 4 B <chr [2]> A B
3 4 C <chr [3]> A C
4 4 D <chr [4]> A D
5 4 A <chr [5]> A A
6 4 C <chr [5]> B C
7 4 B <chr [5]> C B
8 7 B <chr [1]> D B
9 7 C <chr [2]> A C
10 7 D <chr [3]> C D
11 7 A <chr [4]> B A
12 7 D <chr [5]> B D
You can use accumulate
+ substr
like below
library(dplyr)
library(purrr)
df %>%
group_by(sess_id) %>%
mutate(Path = accumulate(Page, paste0)) %>%
ungroup() %>%
mutate(
Path = substr(Path, nchar(Path) - 4, nchar(Path)),
Start = substr(Path, 1, 1),
End = Page
)
which gives
# A tibble: 12 × 5
sess_id Page Path Start End
<dbl> <chr> <chr> <chr> <chr>
1 4 A A A A
2 4 B AB A B
3 4 C ABC A C
4 4 D ABCD A D
5 4 A ABCDA A A
6 4 C BCDAC B C
7 4 B CDACB C B
8 7 B B B B
9 7 C BC B C
10 7 D BCD B D
11 7 A BCDA B A
12 7 D BCDAD B D
Use rollapplyr
from package zoo
to create a rolling sequence per group of sess_id
. Then the 1st and the last characters of the sequences are the Start
and End
columns, respectively.
df <- structure(list(
sess_id = c(4, 4, 4, 4, 4, 4, 4, 7, 7, 7, 7, 7),
Page = c("A", "B", "C", "D", "A", "C", "B", "B", "C", "D", "A", "D")),
.Names = c("sess_id", "Page"),
row.names = c(NA, -12L),
class = "data.frame")
fun <- function(x, width) {
y1 <- zoo::rollapplyr(x, width = seq(width), paste, collapse = "")[1:(width - 1L)]
y2 <- zoo::rollapplyr(x, width = width, paste, collapse = "")
c(y1, y2)
}
sp <- split(df$Page, df$sess_id)
l <- 5L
df$Path <- unlist(lapply(sp, fun, width = l))
df$Start <- substr(df$Path, 1, 1)
df$End <- substring(df$Path, nchar(df$Path))
df
#> sess_id Page Path Start End
#> 1 4 A A A A
#> 2 4 B AB A B
#> 3 4 C ABC A C
#> 4 4 D ABCD A D
#> 5 4 A ABCDA A A
#> 6 4 C BCDAC B C
#> 7 4 B CDACB C B
#> 8 7 B B B B
#> 9 7 C BC B C
#> 10 7 D BCD B D
#> 11 7 A BCDA B A
#> 12 7 D BCDAD B D
Created on 2022-11-08 with reprex v2.0.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With