Delete certain rows in a group of rows in R

Question

Suppose I have this dataset

Id Name Price sales Profit Month Category Mode Supplier
1    A     2     0      0     1        X    K     John
1    A     2     0      0     2        X    K     John
1    A     2     5      8     3        X    K     John
1    A     2     5      8     4        X    L      Sam
2    B     2     3      4     1        X    L      Sam
2    B     2     0      0     2        X    L      Sam
2    B     2     0      0     3        X    M     John
2    B     2     0      0     4        X    L     John
3    C     2     0      0     1        X    K     John
3    C     2     8     10     2        Y    M     John
3    C     2     8     10     3        Y    K     John
3    C     2     0      0     4        Y    K     John
5    E     2     0      0     1        Y    M      Sam
5    E     2     5      5     2        Y    L      Sam
5    E     2     5      9     3        Y    M      Sam
5    E     2     0      0     4        Z    M     Kyle
5    E     2     5      8     5        Z    L     Kyle
5    E     2     5      8     6        Z    M     Kyle

I want to delete rows with zeroes for Sales and Profit column by Id group So for a certain Id if two or more consecutive rows have zero values for sales and profit those rows will get delete. So this dataset will become like this.

Id Name Price sales Profit Month Category Mode Supplier
1    A     2     5      8     3        X    K     John
1    A     2     5      8     4        X    L      Sam
2    B     2     3      4     1        X    L      Sam
3    C     2     0      0     1        X    K     John
3    C     2     8     10     2        Y    M     John
3    C     2     8     10     3        Y    K     John
3    C     2     0      0     4        Y    K     John
5    E     2     0      0     1        Y    M      Sam
5    E     2     5      5     2        Y    L      Sam
5    E     2     5      9     3        Y    M      Sam
5    E     2     0      0     4        Z    M     Kyle
5    E     2     5      8     5        Z    L     Kyle
5    E     2     5      8     6        Z    M     Kyle

I can remove all rows if they have zero values for Sales and Profit with

df1 = df[!(df$sales==0 & test$Profit==0),]

But how to delete rows only in certain group in this case by Id

P.S The idea is to delete entries for those products if they started selling after few months or got abandoned after few months in a year cycle.

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

Here's an approach using rleid from "data.table":

library(data.table)
as.data.table(mydf)[, N := .N, by = .(Id, rleid(sales == 0 & Profit == 0))][
    !(sales == 0 & Profit == 0 & N >= 2)]
##     Id Name Price sales Profit Month Category Mode Supplier N
##  1:  1    A     2     5      8     3        X    K     John 2
##  2:  1    A     2     5      8     4        X    L      Sam 2
##  3:  2    B     2     3      4     1        X    L      Sam 1
##  4:  3    C     2     0      0     1        X    K     John 1
##  5:  3    C     2     8     10     2        Y    M     John 2
##  6:  3    C     2     8     10     3        Y    K     John 2
##  7:  3    C     2     0      0     4        Y    K     John 1
##  8:  5    E     2     0      0     1        Y    M      Sam 1
##  9:  5    E     2     5      5     2        Y    L      Sam 2
## 10:  5    E     2     5      9     3        Y    M      Sam 2
## 11:  5    E     2     0      0     4        Z    M     Kyle 1
## 12:  5    E     2     5      8     5        Z    L     Kyle 2
## 13:  5    E     2     5      8     6        Z    M     Kyle 2

Pierre Lapointe · Answer

Here's how to do it with dplyr. Basically, I'm only keeping lines that are not zero OR that the previous/following lines is not zero.

table1 %>%
group_by(Id) %>%
mutate(Lag=lag(sales),Lead=lead(sales)) %>%
rowwise() %>%
mutate(Min=min(Lag,Lead,na.rm=TRUE)) %>%
filter(sales>0|Min>0)  %>%
select(-Lead,-Lag,-Min)

      Id  Name Price sales Profit Month Category  Mode Supplier
   (int) (chr) (int) (int)  (int) (int)    (chr) (chr)    (chr)
1      1     A     2     5      8     3        X     K     John
2      1     A     2     5      8     4        X     L      Sam
3      2     B     2     3      4     1        X     L      Sam
4      3     C     2     0      0     1        X     K     John
5      3     C     2     8     10     2        Y     M     John
6      3     C     2     8     10     3        Y     K     John
7      3     C     2     0      0     4        Y     K     John
8      5     E     2     0      0     1        Y     M      Sam
9      5     E     2     5      5     2        Y     L      Sam
10     5     E     2     5      9     3        Y     M      Sam
11     5     E     2     0      0     4        Z     M     Kyle
12     5     E     2     5      8     5        Z     L     Kyle
13     5     E     2     5      8     6        Z     M     Kyle

Data

table1 <-read.table(text="
Id,Name,Price,sales,Profit,Month,Category,Mode,Supplier
1,A,2,0,0,1,X,K,John
1,A,2,0,0,2,X,K,John
1,A,2,5,8,3,X,K,John
1,A,2,5,8,4,X,L,Sam
2,B,2,3,4,1,X,L,Sam
2,B,2,0,0,2,X,L,Sam
2,B,2,0,0,3,X,M,John
2,B,2,0,0,4,X,L,John
3,C,2,0,0,1,X,K,John
3,C,2,8,10,2,Y,M,John
3,C,2,8,10,3,Y,K,John
3,C,2,0,0,4,Y,K,John
5,E,2,0,0,1,Y,M,Sam
5,E,2,5,5,2,Y,L,Sam
5,E,2,5,9,3,Y,M,Sam
5,E,2,0,0,4,Z,M,Kyle
5,E,2,5,8,5,Z,L,Kyle
5,E,2,5,8,6,Z,M,Kyle
",sep=",",stringsAsFactors =FALSE, header=TRUE)

UPDATE To filter on more than one column with these criteria, here's how to do it. In the present case, the result is the same because when sales are 0, profits are also 0.

library(dplyr)
table1 %>%
group_by(Id) %>%
mutate(LagS=lag(sales),LeadS=lead(sales),LagP=lag(Profit),LeadP=lead(Profit)) %>%
rowwise() %>%
mutate(MinS=min(LagS,LeadS,na.rm=TRUE),MinP=min(LagP,LeadP,na.rm=TRUE)) %>%
filter(sales>0|MinS>0|Profit>0|MinP>0)  %>%         # "|" means OR
select(-LeadS,-LagS,-MinS,-LeadP,-LagP,-MinP)

Delete certain rows in a group of rows in R

Tags:

r

Jay khan

2 Answers

A5C1D2H2I1M1N2O1R2T1

Pierre Lapointe

Recent Activity

Donate For Us

Delete certain rows in a group of rows in R

Tags:

r

Jay khan

2 Answers

A5C1D2H2I1M1N2O1R2T1

Pierre Lapointe

Related questions

Recent Activity

Donate For Us