Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete certain rows in a group of rows in R

Tags:

r

Suppose I have this dataset

Id Name Price sales Profit Month Category Mode Supplier
1    A     2     0      0     1        X    K     John
1    A     2     0      0     2        X    K     John
1    A     2     5      8     3        X    K     John
1    A     2     5      8     4        X    L      Sam
2    B     2     3      4     1        X    L      Sam
2    B     2     0      0     2        X    L      Sam
2    B     2     0      0     3        X    M     John
2    B     2     0      0     4        X    L     John
3    C     2     0      0     1        X    K     John
3    C     2     8     10     2        Y    M     John
3    C     2     8     10     3        Y    K     John
3    C     2     0      0     4        Y    K     John
5    E     2     0      0     1        Y    M      Sam
5    E     2     5      5     2        Y    L      Sam
5    E     2     5      9     3        Y    M      Sam
5    E     2     0      0     4        Z    M     Kyle
5    E     2     5      8     5        Z    L     Kyle
5    E     2     5      8     6        Z    M     Kyle

I want to delete rows with zeroes for Sales and Profit column by Id group So for a certain Id if two or more consecutive rows have zero values for sales and profit those rows will get delete. So this dataset will become like this.

Id Name Price sales Profit Month Category Mode Supplier
1    A     2     5      8     3        X    K     John
1    A     2     5      8     4        X    L      Sam
2    B     2     3      4     1        X    L      Sam
3    C     2     0      0     1        X    K     John
3    C     2     8     10     2        Y    M     John
3    C     2     8     10     3        Y    K     John
3    C     2     0      0     4        Y    K     John
5    E     2     0      0     1        Y    M      Sam
5    E     2     5      5     2        Y    L      Sam
5    E     2     5      9     3        Y    M      Sam
5    E     2     0      0     4        Z    M     Kyle
5    E     2     5      8     5        Z    L     Kyle
5    E     2     5      8     6        Z    M     Kyle

I can remove all rows if they have zero values for Sales and Profit with

df1 = df[!(df$sales==0 & test$Profit==0),]

But how to delete rows only in certain group in this case by Id

P.S The idea is to delete entries for those products if they started selling after few months or got abandoned after few months in a year cycle.

like image 230
Jay khan Avatar asked Dec 15 '15 16:12

Jay khan


2 Answers

Here's an approach using rleid from "data.table":

library(data.table)
as.data.table(mydf)[, N := .N, by = .(Id, rleid(sales == 0 & Profit == 0))][
    !(sales == 0 & Profit == 0 & N >= 2)]
##     Id Name Price sales Profit Month Category Mode Supplier N
##  1:  1    A     2     5      8     3        X    K     John 2
##  2:  1    A     2     5      8     4        X    L      Sam 2
##  3:  2    B     2     3      4     1        X    L      Sam 1
##  4:  3    C     2     0      0     1        X    K     John 1
##  5:  3    C     2     8     10     2        Y    M     John 2
##  6:  3    C     2     8     10     3        Y    K     John 2
##  7:  3    C     2     0      0     4        Y    K     John 1
##  8:  5    E     2     0      0     1        Y    M      Sam 1
##  9:  5    E     2     5      5     2        Y    L      Sam 2
## 10:  5    E     2     5      9     3        Y    M      Sam 2
## 11:  5    E     2     0      0     4        Z    M     Kyle 1
## 12:  5    E     2     5      8     5        Z    L     Kyle 2
## 13:  5    E     2     5      8     6        Z    M     Kyle 2
like image 172
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 20 '22 02:09

A5C1D2H2I1M1N2O1R2T1


Here's how to do it with dplyr. Basically, I'm only keeping lines that are not zero OR that the previous/following lines is not zero.

table1 %>%
group_by(Id) %>%
mutate(Lag=lag(sales),Lead=lead(sales)) %>%
rowwise() %>%
mutate(Min=min(Lag,Lead,na.rm=TRUE)) %>%
filter(sales>0|Min>0)  %>%
select(-Lead,-Lag,-Min)

      Id  Name Price sales Profit Month Category  Mode Supplier
   (int) (chr) (int) (int)  (int) (int)    (chr) (chr)    (chr)
1      1     A     2     5      8     3        X     K     John
2      1     A     2     5      8     4        X     L      Sam
3      2     B     2     3      4     1        X     L      Sam
4      3     C     2     0      0     1        X     K     John
5      3     C     2     8     10     2        Y     M     John
6      3     C     2     8     10     3        Y     K     John
7      3     C     2     0      0     4        Y     K     John
8      5     E     2     0      0     1        Y     M      Sam
9      5     E     2     5      5     2        Y     L      Sam
10     5     E     2     5      9     3        Y     M      Sam
11     5     E     2     0      0     4        Z     M     Kyle
12     5     E     2     5      8     5        Z     L     Kyle
13     5     E     2     5      8     6        Z     M     Kyle

Data

table1 <-read.table(text="
Id,Name,Price,sales,Profit,Month,Category,Mode,Supplier
1,A,2,0,0,1,X,K,John
1,A,2,0,0,2,X,K,John
1,A,2,5,8,3,X,K,John
1,A,2,5,8,4,X,L,Sam
2,B,2,3,4,1,X,L,Sam
2,B,2,0,0,2,X,L,Sam
2,B,2,0,0,3,X,M,John
2,B,2,0,0,4,X,L,John
3,C,2,0,0,1,X,K,John
3,C,2,8,10,2,Y,M,John
3,C,2,8,10,3,Y,K,John
3,C,2,0,0,4,Y,K,John
5,E,2,0,0,1,Y,M,Sam
5,E,2,5,5,2,Y,L,Sam
5,E,2,5,9,3,Y,M,Sam
5,E,2,0,0,4,Z,M,Kyle
5,E,2,5,8,5,Z,L,Kyle
5,E,2,5,8,6,Z,M,Kyle
",sep=",",stringsAsFactors =FALSE, header=TRUE)

UPDATE To filter on more than one column with these criteria, here's how to do it. In the present case, the result is the same because when sales are 0, profits are also 0.

library(dplyr)
table1 %>%
group_by(Id) %>%
mutate(LagS=lag(sales),LeadS=lead(sales),LagP=lag(Profit),LeadP=lead(Profit)) %>%
rowwise() %>%
mutate(MinS=min(LagS,LeadS,na.rm=TRUE),MinP=min(LagP,LeadP,na.rm=TRUE)) %>%
filter(sales>0|MinS>0|Profit>0|MinP>0)  %>%         # "|" means OR
select(-LeadS,-LagS,-MinS,-LeadP,-LagP,-MinP)
like image 30
Pierre Lapointe Avatar answered Sep 20 '22 02:09

Pierre Lapointe