Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determine length of sequences between columns or in string - and paste result

Tags:

r

I'm working with data like this:

> df
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1  1  0  0  0  1  1  0  0  1   1
2  1  1  1  0  0  0  0  1  0   1
3  1  1  0  0  1  0  0  1  0   1
4  1  0  0  0  0  0  0  1  1   1
5  0  0  0  1  0  0  1  1  1   1
6  0  0  1  1  0  0  1  0  1   0

dput(df) is as follows

df <- structure(list(V1 = c(1, 1, 1, 1, 0, 0), V2 = c(0, 1, 1, 0, 0, 
                                                      0), V3 = c(0, 1, 0, 0, 0, 1), V4 = c(0, 0, 0, 0, 1, 1), V5 = c(1, 
                                                                                                                     0, 1, 0, 0, 0), V6 = c(1, 0, 0, 0, 0, 0), V7 = c(0, 0, 0, 0, 
                                                                                                                                                                      1, 1), V8 = c(0, 1, 1, 1, 1, 0), V9 = c(1, 0, 0, 1, 1, 1), V10 = c(1, 
                                                                                                                                                                                                                                         1, 1, 1, 1, 0)), row.names = c(NA, -6L), class = c("tbl_df", 
                                                                                                                                                                                                                                                                                            "tbl", "data.frame"), spec = structure(list(cols = list(V1 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V2 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V3 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V4 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V5 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V6 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V7 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V8 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V9 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V10 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      "collector")), Sequence = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "collector"))), default = structure(list(), class = c("collector_guess", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  "collector")), skip = 1L), class = "col_spec"))

I need to copy V1:V10 and replace the 0s in between 1s by the number of zeros in between. The 1s schould be set to NA, so should 0s at the beginning and the end, because they are not in between 1s.

So, for example, row 1 should be transformed from 1 0 0 0 1 1 0 0 1 1 to NA 3 3 3 NA NA 2 2 NA. Row 6 from 0 0 1 1 0 0 1 0 1 0 to NA NA NA NA 2 2 NA 1 NA NA.

Is there a way to do this in a loop? Or might a way be to unite V1:V10 in a single cell, matching specific patterns, transforming them - and splitting the cell afterwards again?

I have to admit, this is way beyond my skills. But I was assigned with this task and I'm grateful for any suggestions!

Thanks!

like image 422
Jakob Avatar asked Feb 09 '21 13:02

Jakob


People also ask

How to find the number of possible sequences of length n?

Given two integers m & n, find the number of possible sequences of length n such that each of the next element is greater than or equal to twice of the previous element but less than or equal to m. Input : m = 10, n = 4 Output : 4 There should be n elements and value of last element should be at-most m.

How to print all sequences of length k in sorted order?

Given two integers k and n, write a function that prints all the sequences of length k composed of numbers 1,2..n. You need to print these sequences in sorted order. The simple idea to print all sequences in sorted order is to start from {1 1 … 1} and keep incrementing the sequence while the sequence doesn’t become {n n … n}.

How do I pad a sequence to a specific length?

The pad_sequences() function can also be used to pad sequences to a preferred length that may be longer than any observed sequences. This can be done by specifying the “maxlen” argument to the desired length. Padding will then be performed on all sequences to achieve the desired length, as follows.

Which is the longest subsequence of X which is a substring of Y?

Input : X = "ABCD", Y = "BACDBDCD" Output : 3 "ACD" is longest subsequence of X which is substring of Y. Input : X = "A", Y = "A" Output : 1 Try It! Use brute force to find all the subsequences of X and for each subsequence check whether it is a substring of Y or not.


2 Answers

One dplyr and tidyr option could be:

df %>%
 rowid_to_column() %>%
 pivot_longer(-rowid) %>%
 group_by(rowid) %>%
 mutate(value = if_else(value != 0 | cumsum(value) == 0 | rev(cumsum(rev(value))) == 0,
                        NA_integer_,
                        with(rle(value), rep(lengths * (values == 0), lengths)))) %>%
 pivot_wider(names_from = "name",
             values_from = "value")

  rowid    V1    V2    V3    V4    V5    V6    V7    V8    V9   V10
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     1    NA     3     3     3    NA    NA     2     2    NA    NA
2     2    NA    NA    NA     4     4     4     4    NA     1    NA
3     3    NA    NA     2     2    NA     2     2    NA     1    NA
4     4    NA     6     6     6     6     6     6    NA    NA    NA
5     5    NA    NA    NA    NA     2     2    NA    NA    NA    NA
6     6    NA    NA    NA    NA     2     2    NA     1    NA    NA
like image 115
tmfmnk Avatar answered Sep 29 '22 06:09

tmfmnk


Define a function recalc which acts on one row and then apply it to each row. recalc identifies the runs using rleid and then performs a calculation on each run passing the 1/0 value as the real part and the run number as the imaginary part of a complex vector to f. In f if the run contains 1 (real part) or it is the first run or is the last run (imaginary part) it is replaced with NA otherwise with the length. Finally recalc takes the real part.

library(data.table)

recalc <- function(x) {
  r <- rleid(x)
  f <- function(z) if (Re(z)[1] == 1 || Im(z) %in% range(r)) NA else length(z)
  Re(ave(x + r * 1i, r, FUN = f))
}
t(apply(DF, 1, recalc))

giving this matrix:

     V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1,] NA  3  3  3 NA NA  2  2 NA  NA
[2,] NA NA NA  4  4  4  4 NA  1  NA
[3,] NA NA  2  2 NA  2  2 NA  1  NA
[4,] NA  6  6  6  6  6  6 NA NA  NA
[5,] NA NA NA NA  2  2 NA NA NA  NA
[6,] NA NA NA NA  2  2 NA  1 NA  NA
like image 40
G. Grothendieck Avatar answered Sep 29 '22 07:09

G. Grothendieck