I'm working with data like this:
> df
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 1 0 0 0 1 1 0 0 1 1
2 1 1 1 0 0 0 0 1 0 1
3 1 1 0 0 1 0 0 1 0 1
4 1 0 0 0 0 0 0 1 1 1
5 0 0 0 1 0 0 1 1 1 1
6 0 0 1 1 0 0 1 0 1 0
dput(df)
is as follows
df <- structure(list(V1 = c(1, 1, 1, 1, 0, 0), V2 = c(0, 1, 1, 0, 0,
0), V3 = c(0, 1, 0, 0, 0, 1), V4 = c(0, 0, 0, 0, 1, 1), V5 = c(1,
0, 1, 0, 0, 0), V6 = c(1, 0, 0, 0, 0, 0), V7 = c(0, 0, 0, 0,
1, 1), V8 = c(0, 1, 1, 1, 1, 0), V9 = c(1, 0, 0, 1, 1, 1), V10 = c(1,
1, 1, 1, 1, 0)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"), spec = structure(list(cols = list(V1 = structure(list(), class = c("collector_double",
"collector")), V2 = structure(list(), class = c("collector_double",
"collector")), V3 = structure(list(), class = c("collector_double",
"collector")), V4 = structure(list(), class = c("collector_double",
"collector")), V5 = structure(list(), class = c("collector_double",
"collector")), V6 = structure(list(), class = c("collector_double",
"collector")), V7 = structure(list(), class = c("collector_double",
"collector")), V8 = structure(list(), class = c("collector_double",
"collector")), V9 = structure(list(), class = c("collector_double",
"collector")), V10 = structure(list(), class = c("collector_double",
"collector")), Sequence = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
I need to copy V1:V10
and replace the 0
s in between 1
s by the number of zeros in between. The 1
s schould be set to NA
, so should 0
s at the beginning and the end, because they are not in between 1
s.
So, for example, row 1 should be transformed from 1 0 0 0 1 1 0 0 1 1
to NA 3 3 3 NA NA 2 2 NA
. Row 6 from 0 0 1 1 0 0 1 0 1 0
to NA NA NA NA 2 2 NA 1 NA NA
.
Is there a way to do this in a loop? Or might a way be to unite V1:V10
in a single cell, matching specific patterns, transforming them - and splitting the cell afterwards again?
I have to admit, this is way beyond my skills. But I was assigned with this task and I'm grateful for any suggestions!
Thanks!
Given two integers m & n, find the number of possible sequences of length n such that each of the next element is greater than or equal to twice of the previous element but less than or equal to m. Input : m = 10, n = 4 Output : 4 There should be n elements and value of last element should be at-most m.
Given two integers k and n, write a function that prints all the sequences of length k composed of numbers 1,2..n. You need to print these sequences in sorted order. The simple idea to print all sequences in sorted order is to start from {1 1 … 1} and keep incrementing the sequence while the sequence doesn’t become {n n … n}.
The pad_sequences() function can also be used to pad sequences to a preferred length that may be longer than any observed sequences. This can be done by specifying the “maxlen” argument to the desired length. Padding will then be performed on all sequences to achieve the desired length, as follows.
Input : X = "ABCD", Y = "BACDBDCD" Output : 3 "ACD" is longest subsequence of X which is substring of Y. Input : X = "A", Y = "A" Output : 1 Try It! Use brute force to find all the subsequences of X and for each subsequence check whether it is a substring of Y or not.
One dplyr
and tidyr
option could be:
df %>%
rowid_to_column() %>%
pivot_longer(-rowid) %>%
group_by(rowid) %>%
mutate(value = if_else(value != 0 | cumsum(value) == 0 | rev(cumsum(rev(value))) == 0,
NA_integer_,
with(rle(value), rep(lengths * (values == 0), lengths)))) %>%
pivot_wider(names_from = "name",
values_from = "value")
rowid V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 NA 3 3 3 NA NA 2 2 NA NA
2 2 NA NA NA 4 4 4 4 NA 1 NA
3 3 NA NA 2 2 NA 2 2 NA 1 NA
4 4 NA 6 6 6 6 6 6 NA NA NA
5 5 NA NA NA NA 2 2 NA NA NA NA
6 6 NA NA NA NA 2 2 NA 1 NA NA
Define a function recalc which acts on one row and then apply it to each row. recalc identifies the runs using rleid and then performs a calculation on each run passing the 1/0 value as the real part and the run number as the imaginary part of a complex vector to f. In f if the run contains 1 (real part) or it is the first run or is the last run (imaginary part) it is replaced with NA otherwise with the length. Finally recalc takes the real part.
library(data.table)
recalc <- function(x) {
r <- rleid(x)
f <- function(z) if (Re(z)[1] == 1 || Im(z) %in% range(r)) NA else length(z)
Re(ave(x + r * 1i, r, FUN = f))
}
t(apply(DF, 1, recalc))
giving this matrix:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1,] NA 3 3 3 NA NA 2 2 NA NA
[2,] NA NA NA 4 4 4 4 NA 1 NA
[3,] NA NA 2 2 NA 2 2 NA 1 NA
[4,] NA 6 6 6 6 6 6 NA NA NA
[5,] NA NA NA NA 2 2 NA NA NA NA
[6,] NA NA NA NA 2 2 NA 1 NA NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With