Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a vector into sub vectors with constant length and constant overlap

Tags:

split

r

I want to split a vector into a set of subvectors such that these conditions hold:

  1. all subvectors must have a constant length

  2. all subvectors must overlap with a constant overlapping length

I need a modification to this:

########Block function######################
blocks <- function(len, ov, n) {

  starts <- unique(sort(c(seq(1, n, len), seq(len-ov+1, n, len))))
  ends <- pmin(starts + len - 1, n)

  # truncate starts and ends to the first num elements
  num <- match(n, ends)
  head(data.frame(starts, ends), num)
}

########Moving block#############
vec = 1:17 # a list or vector
len = 8 # the length of the subvector 
ov = ceiling(len/2) # the length of overlap
b <- blocks(len, ov, length(vec))
with(b, Map(function(i, j) vec[i:j], starts, ends))

that produces this:

[1] 1 2 3 4 5 6 7 8 # subvector 1

[2] 5 6 7 8 9 10 11 12 # subvector 2

[3] 9 10 11 12 13 14 15 16 # subvector 3

[4] 13 14 15 16 17 # subvector 4: overlap needs modification to achieve constant length

What I want

I want the last subvector that doesn't reach the specified length to have an overlap like this:

new_overlap = old_overlap + (old_length - new_length)

  • new_overlap is the length of overlap of the last subvector that its length is less than the set length.

  • old_overlap is the set length of overlap

  • old_length is the set length of subvector

[1] 1 2 3 4 5 6 7 8 # my desired subvector 1

[2] 5 6 7 8 9 10 11 12 # my desired subvector 2

[3] 9 10 11 12 13 14 15 16 # my desired subvector 3

[4] 10 11 12 13 14 15 16 17 # my desired subvector 4

I want a list to be split with the following conditions:

  • It should have the same sublist length.

  • It should overlap with a constant

Attempted Solution with an error message but nice result

blocks <- function(len, ov, n) {

  starts <- unique(sort(c(seq(1, n, len), seq(len-ov+1, n, len))))
  ends <- pmin(starts + len - 1, n)

  # truncate starts and ends to the first num elements
  num <- match(n, ends)
  head(data.frame(starts, ends), num)
}

########Moving block#############
vec = 1:10 # vector
len = 5 #set length
ov = 1#ceiling(len/2) # set overlap
b <- blocks(len, ov, length(vec))
#with(b, Map(function(i, j) vec[i:j], starts, ends))

out <- with(b, Map(function(i, j) vec[i:j], starts, ends))
last_1en <- length(out)
if(length(out[l1]) < len) { # if last length is less than set length
  out[[l1]] <- unlist(out[(ov) + (len - l1)])
}
out

Error in out[[l1]] : subscript out of bounds

[[1]] [1] 1 2 3 4 5

[[2]] [1] 5 6 7 8 9

[[3]] [1] 6 7 8 9 10

Second Edit

I have debugged the error by changing the out[[l1]] to out[l1].

like image 902
Daniel James Avatar asked Dec 31 '22 13:12

Daniel James


1 Answers

A tidyverse strategy, I think should work properly for all kind of input vectors

vec = 1:23
len = 7
ov = 6

library(tidyverse)
anil <- function(vec, len, ov){
  seq_len((length(vec) - ov) %/% (len - ov) +1) %>%
  as.data.frame() %>%
  setNames('id') %>%
  mutate(start = accumulate(id, ~ .x + len - ov),
         end = pmin(start + len - 1, length(vec)),
         start = pmin(start, end - len + 1)) %>%
  filter(!duplicated(paste(start, end, sep = '-'))) %>%
  transmute(desired = map2(start, end, ~ vec[.x:.y])) %>%
  as.list
}

anil(1:23, len = 7, ov = 6)
#> $desired
#> $desired[[1]]
#> [1] 1 2 3 4 5 6 7
#> 
#> $desired[[2]]
#> [1] 2 3 4 5 6 7 8
#> 
#> $desired[[3]]
#> [1] 3 4 5 6 7 8 9
#> 
#> $desired[[4]]
#> [1]  4  5  6  7  8  9 10
#> 
#> $desired[[5]]
#> [1]  5  6  7  8  9 10 11
#> 
#> $desired[[6]]
#> [1]  6  7  8  9 10 11 12
#> 
#> $desired[[7]]
#> [1]  7  8  9 10 11 12 13
#> 
#> $desired[[8]]
#> [1]  8  9 10 11 12 13 14
#> 
#> $desired[[9]]
#> [1]  9 10 11 12 13 14 15
#> 
#> $desired[[10]]
#> [1] 10 11 12 13 14 15 16
#> 
#> $desired[[11]]
#> [1] 11 12 13 14 15 16 17
#> 
#> $desired[[12]]
#> [1] 12 13 14 15 16 17 18
#> 
#> $desired[[13]]
#> [1] 13 14 15 16 17 18 19
#> 
#> $desired[[14]]
#> [1] 14 15 16 17 18 19 20
#> 
#> $desired[[15]]
#> [1] 15 16 17 18 19 20 21
#> 
#> $desired[[16]]
#> [1] 16 17 18 19 20 21 22
#> 
#> $desired[[17]]
#> [1] 17 18 19 20 21 22 23
anil(LETTERS[1:21], 7, 2)
#> $desired
#> $desired[[1]]
#> [1] "A" "B" "C" "D" "E" "F" "G"
#> 
#> $desired[[2]]
#> [1] "F" "G" "H" "I" "J" "K" "L"
#> 
#> $desired[[3]]
#> [1] "K" "L" "M" "N" "O" "P" "Q"
#> 
#> $desired[[4]]
#> [1] "O" "P" "Q" "R" "S" "T" "U"
anil(1:17, 8, 4)
#> $desired
#> $desired[[1]]
#> [1] 1 2 3 4 5 6 7 8
#> 
#> $desired[[2]]
#> [1]  5  6  7  8  9 10 11 12
#> 
#> $desired[[3]]
#> [1]  9 10 11 12 13 14 15 16
#> 
#> $desired[[4]]
#> [1] 10 11 12 13 14 15 16 17

Created on 2021-07-07 by the reprex package (v2.0.0)

like image 199
AnilGoyal Avatar answered May 01 '23 03:05

AnilGoyal