I am trying to solve the very basic example and trying to extract following data:
count SN data.stamp
1 00601 2018-07-26 13:38:39
0 00601 2018-11-05 23:00:09
0 00601 2018-11-05 23:00:16
4 00601 2018-11-12 23:00:05
0 00601 2018-12-12 23:00:05
5 00601 2018-11-12 23:00:05
0 00601 2018-12-12 23:00:05
0 00601 2018-11-12 23:00:05
0 00601 2018-12-12 23:00:05
Expected output:
count SN data.stamp
1 00601 2018-07-26 13:38:39
0 00601 2018-11-05 23:00:09
4 00601 2018-11-12 23:00:05
0 00601 2018-12-12 23:00:05
5 00601 2018-11-12 23:00:05
0 00601 2018-12-12 23:00:05
I would like to consider only single count with 0 value. If there are multiple count of 0 values then it should consider only first value and ignore rest of 0 counts.
Basically, I am looking for only first zero value followed by non zero value.
I tried using rle but I would like to extract data from the data.frame. rle can give me information about the values and lengths. I can write for loop to check but looking for a quick and short way.
In base R, you can subset your data.frame to get only the rows for which count is different from 0 or count is 0 but the previous row was different from zero:
df[df$count!=0 | (df$count==0 & c(TRUE, head(df$count, -1)!=0)), ]
# (or: subset(df, count!=0 | (count==0 & c(TRUE, head(count, -1)!=0))))
# count SN data.stamp
#1 1 601 2018-07-26 13:38:39
#2 0 601 2018-11-05 23:00:09
#4 4 601 2018-11-12 23:00:05
#5 0 601 2018-12-12 23:00:05
#6 5 601 2018-11-12 23:00:05
#7 0 601 2018-12-12 23:00:05
We can use rleid from data.table to create a logical vector for filtering the rows
library(dplyr)
df1 %>%
filter(!duplicated(cbind(data.table::rleid(count), SN)))
To be more precise, rleid can be applied on a logical vector
df1 %>%
filter(!duplicated(cbind(rleid(count== 0), SN)))
The rleid checks adjacent elements for similarity and when there is an inequality it increases the id created by 1. i.e.
v1 <- c(1, 0, 0, 5, 4, 5, 5)
rleid(v1)
#[1] 1 2 2 3 4 5 5
Now, all duplicate elements that are adjacent are given the same ID. If we are specific in recognizing '0's as duplicates
rleid(v1 == 0)
#[1] 1 2 2 3 3 3 3
Here, there are only two values i.e. TRUE/FALSE
v1 == 0
#[1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE
Wrapping with duplicated returns a logical index on the index
If we want a base R solution, this can be done with rle. Create the sequence with replicating the values with the lengths and get the logical vector by wrapping with duplicated as before
i1 <- with(rle(!df1$count), rep(seq_along(values), lengths))
i2 <- !duplicated(cbind(i1, df1$SN))
df1[i2, ]
# count SN data.stamp
#1 1 601 2018-07-26 13:38:39
#2 0 601 2018-11-05 23:00:09
#4 4 601 2018-11-12 23:00:05
#5 0 601 2018-12-12 23:00:05
#6 5 601 2018-11-12 23:00:05
#7 0 601 2018-12-12 23:00:05
df1 <- structure(list(count = c(1L, 0L, 0L, 4L, 0L, 5L, 0L, 0L, 0L),
SN = c(601L, 601L, 601L, 601L, 601L, 601L, 601L, 601L, 601L
), data.stamp = c("2018-07-26 13:38:39", "2018-11-05 23:00:09",
"2018-11-05 23:00:16", "2018-11-12 23:00:05", "2018-12-12 23:00:05",
"2018-11-12 23:00:05", "2018-12-12 23:00:05", "2018-11-12 23:00:05",
"2018-12-12 23:00:05")), class = "data.frame", row.names = c(NA,
-9L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With