Suppose I have this data:
x = c(14,14, 6, 7 ,14 , 0 ,0 ,0 , 0, 0, 0 , 0 , 0, 0 , 0 , 0 , 0, 9 ,1 , 3 ,8 ,9 ,15, 9 , 8, 13, 8, 4 , 6 , 7 ,10 ,13, 3,
0 , 0 , 0 , 0 , 0 , 0, 0, 0 , 0 , 0 , 0, 0, 0, 0, 0 ,0, 0 , 0 , 0, 0, 0, 0, 0 , 0, 0, 4 , 7 ,4, 5 ,16 , 5 ,5 , 9 , 4 ,4, 9 , 8, 2, 0 ,0 ,0 ,0 ,0, 0, 0, 0 ,0 , 0, 0, 0, 0, 0, 0, 0, 0,0)
x
[1] 14 14 6 7 14 0 0 0 0 0 0 0 0 0 0 0 0 9 1 3 8 9 15 9 8
[26] 13 8 4 6 7 10 13 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[51] 0 0 0 0 0 0 0 0 4 7 4 5 16 5 5 9 4 4 9 8 2 0 0 0 0
[76] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I want to recover the indices beginning where there are more than 3 zeroes in a row and terminating with the last 0 before a nonzero.
For example,
I would get
6, 17 for the first rash of zeroes, etc.
We can see after length 12 sequence is repeating and in lengths of 12. And in a segment of length 12, there are total 2 pairs of consecutive zeros. Hence we can generalize the given pattern q = (2^n/12) and total pairs of consecutive zeros will be 2*q+1.
Expert-verified answer [It means all numbers from 1 to 100 multiplied] and they ask you the number of zeroes. So, to solve these questions faster, you must know that every zero in a number is due to multiplication of one 5 and one 2.
If x
happens to be a column of a data.table
you can do
library(data.table)
dt <- data.table(x = x)
dt[, if(.N > 3 & all(x == 0)) .(starts = first(.I), ends = last(.I))
, by = rleid(x)]
# rleid starts ends
# 1: 5 6 17
# 2: 22 34 58
# 3: 34 72 89
Explanation:
rleid(x)
gives an ID (integer) for each element in x
indicating
which "run" the element is a member of, where "run" means a sequence
of adjacent equal values.
dt[, <code>, by = rle(x)]
partitions dt
according to rleid(x)
and computes <code>
for each subset of dt
's rows. The results are stacked together in a single data.table
.
.N
is the number of elements in the given subset
.I
is the vector of row numbers corresponding to the subset
first
and last
give the first and last element of a vector
.(<stuff>)
is the same as list(<stuff>)
The rleid
function, by
grouping within the brackets, .N and .I symbols, first
and last
functions are part of the data.table
package.
By using dplyr
, get the diff
then if the diff not equal to 0 , they are not belong to same group , after cumsum
we get the grouid
library(dplyr)
df=data.frame('x'=x,rownumber=seq(length(x)))
df$Groupid=cumsum(c(0,diff(df$x==0))!=0)
df%>%group_by(Groupid)%>%summarize(start=first(rownumber),end=last(rownumber),number=first(x),size=n())%>%filter(number==0&size>=3)
# A tibble: 3 x 5
Groupid start end number size
<int> <int> <int> <dbl> <int>
1 1 6 17 0 12
2 3 34 58 0 25
3 5 72 89 0 18
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With