Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a Sequence with a particular length in R

Tags:

r

sequence

I am trying to create a path sequence. The following is a sample dataset:

df <- structure(list(
  sess_id = c(4, 4, 4, 4, 4, 4, 4, 7, 7, 7, 7, 7), 
  Page = c("A", "B", "C", "D", "A", "C", "B", "B", "C", "D", "A", "D")),
  .Names = c("sess_id", "Page"),
  row.names = c(NA, -12L),
  class = "data.frame")

This is the table:

sess_id Page
4 A
4 B
4 C
4 D
4 A
4 C
4 B
7 B
7 C
7 D
7 A
7 D

I would like to add three columns like so:

sess_id Page Path Start End
4 A
4 B AB A B
4 C ABC A C
4 D ABCD A D
4 A ABCDA A A
4 C BCDAC B C
4 B CDACB C B
7 B
7 C BC B C
7 D BCD B D
7 A BCDA B A
7 D BCDAD B D

I am trying to create a path sequence of five pages in each session. And map the start and end of that five-page sequence.

like image 980
user2845095 Avatar asked Oct 19 '25 09:10

user2845095


2 Answers

Update

If you don't want to keep elements in Path as a string, you can use

df %>%
  group_by(sess_id) %>%
  mutate(Path = lapply(accumulate(Page, c), last, 5)) %>%
  ungroup() %>%
  mutate(
    Start = Page[pmax(1, (1:n()) - 4)],
    End = Page
  )

which gives

# A tibble: 12 × 5
   sess_id Page  Path      Start End  
     <dbl> <chr> <list>    <chr> <chr>
 1       4 A     <chr [1]> A     A
 2       4 B     <chr [2]> A     B
 3       4 C     <chr [3]> A     C
 4       4 D     <chr [4]> A     D
 5       4 A     <chr [5]> A     A
 6       4 C     <chr [5]> B     C
 7       4 B     <chr [5]> C     B
 8       7 B     <chr [1]> D     B
 9       7 C     <chr [2]> A     C
10       7 D     <chr [3]> C     D
11       7 A     <chr [4]> B     A
12       7 D     <chr [5]> B     D

Previous Solution

You can use accumulate + substr like below

library(dplyr)
library(purrr)

df %>%
  group_by(sess_id) %>%
  mutate(Path = accumulate(Page, paste0)) %>%
  ungroup() %>%
  mutate(
    Path = substr(Path, nchar(Path) - 4, nchar(Path)),
    Start = substr(Path, 1, 1),
    End = Page
  )

which gives

# A tibble: 12 × 5
   sess_id Page  Path  Start End  
     <dbl> <chr> <chr> <chr> <chr>
 1       4 A     A     A     A
 2       4 B     AB    A     B
 3       4 C     ABC   A     C
 4       4 D     ABCD  A     D
 5       4 A     ABCDA A     A
 6       4 C     BCDAC B     C
 7       4 B     CDACB C     B
 8       7 B     B     B     B
 9       7 C     BC    B     C
10       7 D     BCD   B     D
11       7 A     BCDA  B     A
12       7 D     BCDAD B     D
like image 147
ThomasIsCoding Avatar answered Oct 20 '25 23:10

ThomasIsCoding


Use rollapplyr from package zoo to create a rolling sequence per group of sess_id. Then the 1st and the last characters of the sequences are the Start and End columns, respectively.

df <- structure(list(
  sess_id = c(4, 4, 4, 4, 4, 4, 4, 7, 7, 7, 7, 7), 
  Page = c("A", "B", "C", "D", "A", "C", "B", "B", "C", "D", "A", "D")),
  .Names = c("sess_id", "Page"),
  row.names = c(NA, -12L),
  class = "data.frame")


fun <- function(x, width) {
  y1 <- zoo::rollapplyr(x, width = seq(width), paste, collapse = "")[1:(width - 1L)]
  y2 <- zoo::rollapplyr(x, width = width, paste, collapse = "")
  c(y1, y2)
}

sp <- split(df$Page, df$sess_id)
l <- 5L

df$Path <- unlist(lapply(sp, fun, width = l))
df$Start <- substr(df$Path, 1, 1)
df$End <- substring(df$Path, nchar(df$Path))
df
#>    sess_id Page  Path Start End
#> 1        4    A     A     A   A
#> 2        4    B    AB     A   B
#> 3        4    C   ABC     A   C
#> 4        4    D  ABCD     A   D
#> 5        4    A ABCDA     A   A
#> 6        4    C BCDAC     B   C
#> 7        4    B CDACB     C   B
#> 8        7    B     B     B   B
#> 9        7    C    BC     B   C
#> 10       7    D   BCD     B   D
#> 11       7    A  BCDA     B   A
#> 12       7    D BCDAD     B   D

Created on 2022-11-08 with reprex v2.0.2

like image 44
Rui Barradas Avatar answered Oct 20 '25 23:10

Rui Barradas