Determine length of sequences between columns or in string - and paste result

Tags:

r

I'm working with data like this:

> df
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1  1  0  0  0  1  1  0  0  1   1
2  1  1  1  0  0  0  0  1  0   1
3  1  1  0  0  1  0  0  1  0   1
4  1  0  0  0  0  0  0  1  1   1
5  0  0  0  1  0  0  1  1  1   1
6  0  0  1  1  0  0  1  0  1   0

dput(df) is as follows

df <- structure(list(V1 = c(1, 1, 1, 1, 0, 0), V2 = c(0, 1, 1, 0, 0, 
                                                      0), V3 = c(0, 1, 0, 0, 0, 1), V4 = c(0, 0, 0, 0, 1, 1), V5 = c(1, 
                                                                                                                     0, 1, 0, 0, 0), V6 = c(1, 0, 0, 0, 0, 0), V7 = c(0, 0, 0, 0, 
                                                                                                                                                                      1, 1), V8 = c(0, 1, 1, 1, 1, 0), V9 = c(1, 0, 0, 1, 1, 1), V10 = c(1, 
                                                                                                                                                                                                                                         1, 1, 1, 1, 0)), row.names = c(NA, -6L), class = c("tbl_df", 
                                                                                                                                                                                                                                                                                            "tbl", "data.frame"), spec = structure(list(cols = list(V1 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V2 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V3 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V4 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V5 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V6 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V7 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V8 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V9 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V10 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      "collector")), Sequence = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "collector"))), default = structure(list(), class = c("collector_guess", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  "collector")), skip = 1L), class = "col_spec"))

I need to copy V1:V10 and replace the 0s in between 1s by the number of zeros in between. The 1s schould be set to NA, so should 0s at the beginning and the end, because they are not in between 1s.

So, for example, row 1 should be transformed from 1 0 0 0 1 1 0 0 1 1 to NA 3 3 3 NA NA 2 2 NA. Row 6 from 0 0 1 1 0 0 1 0 1 0 to NA NA NA NA 2 2 NA 1 NA NA.

Is there a way to do this in a loop? Or might a way be to unite V1:V10 in a single cell, matching specific patterns, transforming them - and splitting the cell afterwards again?

I have to admit, this is way beyond my skills. But I was assigned with this task and I'm grateful for any suggestions!

Thanks!

422

asked Feb 09 '21 13:02

Jakob

2 Answers

One dplyr and tidyr option could be:

df %>%
 rowid_to_column() %>%
 pivot_longer(-rowid) %>%
 group_by(rowid) %>%
 mutate(value = if_else(value != 0 | cumsum(value) == 0 | rev(cumsum(rev(value))) == 0,
                        NA_integer_,
                        with(rle(value), rep(lengths * (values == 0), lengths)))) %>%
 pivot_wider(names_from = "name",
             values_from = "value")

  rowid    V1    V2    V3    V4    V5    V6    V7    V8    V9   V10
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     1    NA     3     3     3    NA    NA     2     2    NA    NA
2     2    NA    NA    NA     4     4     4     4    NA     1    NA
3     3    NA    NA     2     2    NA     2     2    NA     1    NA
4     4    NA     6     6     6     6     6     6    NA    NA    NA
5     5    NA    NA    NA    NA     2     2    NA    NA    NA    NA
6     6    NA    NA    NA    NA     2     2    NA     1    NA    NA

115

answered Sep 29 '22 06:09

tmfmnk

Define a function recalc which acts on one row and then apply it to each row. recalc identifies the runs using rleid and then performs a calculation on each run passing the 1/0 value as the real part and the run number as the imaginary part of a complex vector to f. In f if the run contains 1 (real part) or it is the first run or is the last run (imaginary part) it is replaced with NA otherwise with the length. Finally recalc takes the real part.

library(data.table)

recalc <- function(x) {
  r <- rleid(x)
  f <- function(z) if (Re(z)[1] == 1 || Im(z) %in% range(r)) NA else length(z)
  Re(ave(x + r * 1i, r, FUN = f))
}
t(apply(DF, 1, recalc))

giving this matrix:

     V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1,] NA  3  3  3 NA NA  2  2 NA  NA
[2,] NA NA NA  4  4  4  4 NA  1  NA
[3,] NA NA  2  2 NA  2  2 NA  1  NA
[4,] NA  6  6  6  6  6  6 NA NA  NA
[5,] NA NA NA NA  2  2 NA NA NA  NA
[6,] NA NA NA NA  2  2 NA  1 NA  NA

answered Sep 29 '22 07:09

G. Grothendieck

Related questions
                            
                                How to build a crossword-like plot for a boolean matrix
                            
                                R: find vector in list of vectors
                            
                                Search for and remove outliers from a dataframe grouped by a variable
                            
                                R-ranking values of a column by grouping, conditional to another variable
                            
                                Basic - T-Test -> Grouping Factor Must have Exactly 2 Levels
                            
                                data.table replacing a value by NA [duplicate]
                            
                                Most Efficient way to create a symmetric matrix
                            
                                Splitting a single column into multiple observation using R
                            
                                ggplot2: "Unknown parameters: probs" for fun.y = quantile in geom_line()
                            
                                Plotting dose response curves with ggplot2 and drc
                            
                                From long to wide data with multiple columns
                            
                                How to count the number of pages in a PDF from R?
                            
                                Error in y - ymean : non-numeric argument to binary operator randomForest R
                            
                                Creating a Sankey Diagram using NetworkD3 package in R
                            
                                R/ RStudio - install devtools fails?
                            
                                rgdal installation difficulty on ubuntu 16.04 LTS
                            
                                R Time Series Object ts() Date of Minimum and Maximum
                            
                                Does the term "vectorization" mean different things in different contexts?
                            
                                R data.table binary value for last row in group by condition
                            
                                pivot_wider when there's no value column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With