Suppose I have this dataset
Id Name Price sales Profit Month Category Mode Supplier
1 A 2 0 0 1 X K John
1 A 2 0 0 2 X K John
1 A 2 5 8 3 X K John
1 A 2 5 8 4 X L Sam
2 B 2 3 4 1 X L Sam
2 B 2 0 0 2 X L Sam
2 B 2 0 0 3 X M John
2 B 2 0 0 4 X L John
3 C 2 0 0 1 X K John
3 C 2 8 10 2 Y M John
3 C 2 8 10 3 Y K John
3 C 2 0 0 4 Y K John
5 E 2 0 0 1 Y M Sam
5 E 2 5 5 2 Y L Sam
5 E 2 5 9 3 Y M Sam
5 E 2 0 0 4 Z M Kyle
5 E 2 5 8 5 Z L Kyle
5 E 2 5 8 6 Z M Kyle
I want to delete rows with zeroes for Sales
and Profit
column by Id
group
So for a certain Id
if two or more consecutive rows have zero values for sales
and profit
those rows will get delete. So this dataset will become like this.
Id Name Price sales Profit Month Category Mode Supplier
1 A 2 5 8 3 X K John
1 A 2 5 8 4 X L Sam
2 B 2 3 4 1 X L Sam
3 C 2 0 0 1 X K John
3 C 2 8 10 2 Y M John
3 C 2 8 10 3 Y K John
3 C 2 0 0 4 Y K John
5 E 2 0 0 1 Y M Sam
5 E 2 5 5 2 Y L Sam
5 E 2 5 9 3 Y M Sam
5 E 2 0 0 4 Z M Kyle
5 E 2 5 8 5 Z L Kyle
5 E 2 5 8 6 Z M Kyle
I can remove all rows if they have zero values for Sales
and Profit
with
df1 = df[!(df$sales==0 & test$Profit==0),]
But how to delete rows only in certain group in this case by Id
P.S The idea is to delete entries for those products if they started selling after few months or got abandoned after few months in a year cycle.
Here's an approach using rleid
from "data.table":
library(data.table)
as.data.table(mydf)[, N := .N, by = .(Id, rleid(sales == 0 & Profit == 0))][
!(sales == 0 & Profit == 0 & N >= 2)]
## Id Name Price sales Profit Month Category Mode Supplier N
## 1: 1 A 2 5 8 3 X K John 2
## 2: 1 A 2 5 8 4 X L Sam 2
## 3: 2 B 2 3 4 1 X L Sam 1
## 4: 3 C 2 0 0 1 X K John 1
## 5: 3 C 2 8 10 2 Y M John 2
## 6: 3 C 2 8 10 3 Y K John 2
## 7: 3 C 2 0 0 4 Y K John 1
## 8: 5 E 2 0 0 1 Y M Sam 1
## 9: 5 E 2 5 5 2 Y L Sam 2
## 10: 5 E 2 5 9 3 Y M Sam 2
## 11: 5 E 2 0 0 4 Z M Kyle 1
## 12: 5 E 2 5 8 5 Z L Kyle 2
## 13: 5 E 2 5 8 6 Z M Kyle 2
Here's how to do it with dplyr
. Basically, I'm only keeping lines that are not zero OR that the previous/following lines is not zero.
table1 %>%
group_by(Id) %>%
mutate(Lag=lag(sales),Lead=lead(sales)) %>%
rowwise() %>%
mutate(Min=min(Lag,Lead,na.rm=TRUE)) %>%
filter(sales>0|Min>0) %>%
select(-Lead,-Lag,-Min)
Id Name Price sales Profit Month Category Mode Supplier
(int) (chr) (int) (int) (int) (int) (chr) (chr) (chr)
1 1 A 2 5 8 3 X K John
2 1 A 2 5 8 4 X L Sam
3 2 B 2 3 4 1 X L Sam
4 3 C 2 0 0 1 X K John
5 3 C 2 8 10 2 Y M John
6 3 C 2 8 10 3 Y K John
7 3 C 2 0 0 4 Y K John
8 5 E 2 0 0 1 Y M Sam
9 5 E 2 5 5 2 Y L Sam
10 5 E 2 5 9 3 Y M Sam
11 5 E 2 0 0 4 Z M Kyle
12 5 E 2 5 8 5 Z L Kyle
13 5 E 2 5 8 6 Z M Kyle
Data
table1 <-read.table(text="
Id,Name,Price,sales,Profit,Month,Category,Mode,Supplier
1,A,2,0,0,1,X,K,John
1,A,2,0,0,2,X,K,John
1,A,2,5,8,3,X,K,John
1,A,2,5,8,4,X,L,Sam
2,B,2,3,4,1,X,L,Sam
2,B,2,0,0,2,X,L,Sam
2,B,2,0,0,3,X,M,John
2,B,2,0,0,4,X,L,John
3,C,2,0,0,1,X,K,John
3,C,2,8,10,2,Y,M,John
3,C,2,8,10,3,Y,K,John
3,C,2,0,0,4,Y,K,John
5,E,2,0,0,1,Y,M,Sam
5,E,2,5,5,2,Y,L,Sam
5,E,2,5,9,3,Y,M,Sam
5,E,2,0,0,4,Z,M,Kyle
5,E,2,5,8,5,Z,L,Kyle
5,E,2,5,8,6,Z,M,Kyle
",sep=",",stringsAsFactors =FALSE, header=TRUE)
UPDATE To filter on more than one column with these criteria, here's how to do it. In the present case, the result is the same because when sales are 0, profits are also 0.
library(dplyr)
table1 %>%
group_by(Id) %>%
mutate(LagS=lag(sales),LeadS=lead(sales),LagP=lag(Profit),LeadP=lead(Profit)) %>%
rowwise() %>%
mutate(MinS=min(LagS,LeadS,na.rm=TRUE),MinP=min(LagP,LeadP,na.rm=TRUE)) %>%
filter(sales>0|MinS>0|Profit>0|MinP>0) %>% # "|" means OR
select(-LeadS,-LagS,-MinS,-LeadP,-LagP,-MinP)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With