I have a data frame, please see below. How do I compress/delete rows with some conditions? I would like to choose the highest Volume only where there's a continuous 1
in V1
eg: The V1
of df[2:5,]
all equals to 1
, the highest Volume out of these is df[4,]
so delete df[c(2,3,5),]
, for df[9:10,]
, highest Volume of these two is df[10,]
, delete df[9,]
.....so on....and the result will be like Example Picture where V1
will be like 0101010101....
How to achieve this without using loops, how do I achieve this the vectorized way so calculation speed is faster(when dealing with millions of rows)?
Update:
I apply the same loop with V2
but only delete when Volume is less , that's why row 13 got deleted in the Example Picture because the Volume of Row 13 is less than the Volume of Row 14.
Update(another question):
I have tried akrun's method, where I choose the maximum Volume
from the group, but when there're two or more maximum, there will still be a continuous 1
just like below row 9
. How do I remove the duplicate row?
I could use duplicated(df$Volume)
, is there another way? Thanks
Update:
Follow up with akrun's attempt, the code will delete the rows where V1==0
which is not what I'm after, I would like to delete only when V1==1
and the Volume is smaller than the maximum Volume of that group. I know this will have some continuous 0s in V1
, and to delete the duplicated 0s in V1
will depend on the Weight
so that's why I need to delete V1==1
only and cannot delete V1==0
# Volume Weight V1 V2
#1 0.5367 0.5367 0 1
#4 1.1457 1.1413 1 0
#6 0.5694 0.5633 0 1
#7 1.2368 1.2343 1 0
#8 0.9662 0.9593 0 1
#9 1.4102 1.3923 1 0
#10 1.4102 1.3995 1 0
#11 1.1132 1.1069 0 1
#12 1.4535 1.3923 1 0
#14 1.1475 1.1447 0 1
#15 1.1790 1.1748 1 0
#18 1.1557 1.1552 0 1
Data-frame df
Volume Weight V1 V2
1: 0.5367 0.5367 0 1
2: 0.8645 0.8508 1 0
3: 0.8590 0.8585 1 0
4: 1.1457 1.1413 1 0
5: 0.8573 0.8568 1 0
6: 0.5694 0.5633 0 1
7: 1.2368 1.2343 1 0
8: 0.9662 0.9593 0 1
9: 1.3550 1.3412 1 0
10: 1.4102 1.3995 1 0
11: 1.1132 1.1069 0 1
12: 1.4535 1.3923 1 0
13: 1.0437 1.0344 0 1
14: 1.1475 1.1447 0 1
15: 1.1790 1.1748 1 0
16: 1.1749 1.1735 1 0
17: 1.1749 1.1731 1 0
18: 1.1557 1.1552 0 1
Example Picture
Volume Weight V1 V2
1: 0.5367 0.5367 0 1
4: 1.1457 1.1413 1 0
6: 0.5694 0.5633 0 1
7: 1.2368 1.2343 1 0
8: 0.9662 0.9593 0 1
10: 1.4102 1.3995 1 0
11: 1.1132 1.1069 0 1
12: 1.4535 1.3923 1 0
14: 1.1475 1.1447 0 1
15: 1.1790 1.1748 1 0
18: 1.1557 1.1552 0 1
You could use library(data.table)
:
setDT(df)[, .SD[(Volume == max(Volume) & V1 == 1) | V1 != 0], by = rleid(df$V1)][]
Edit:
Concerning the column dropping problem, the adapted trick from akrun:
setDT(df)[df[, .I[(Volume == max(Volume) & V1 == 1) | V1 == 0], rleid(V1)]$V1][]
We can use rle
from base R
grp <- inverse.rle(within.list(rle(df$V1), values <- seq_along(values)))
df[with(df, ave(Volume, grp, FUN = max)==Volume),]
# Volume Weight V1 V2
#1 0.5367 0.5367 0 1
#4 1.1457 1.1413 1 0
#6 0.5694 0.5633 0 1
#7 1.2368 1.2343 1 0
#8 0.9662 0.9593 0 1
#10 1.4102 1.3995 1 0
#11 1.1132 1.1069 0 1
#12 1.4535 1.3923 1 0
#14 1.1475 1.1447 0 1
#15 1.1790 1.1748 1 0
#18 1.1557 1.1552 0 1
NOTE: We used a data.frame
and not data.table
as input data
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With