I have a dataframe which I want to subset depending on whether a value in a certain column is among the top 5 values of all the columns in that row. This is a simplified version of my dataframe:
> my.df <- data.frame(a = rnorm(10,5), b= rnorm(10,5), c=rnorm(10,5), d=rnorm(10,5), e=rnorm(10,5))
> my.df
a b c d e
1 6.401462 5.318849 5.373496 5.101140 3.710973
2 6.715845 4.786936 3.521965 4.264029 4.525138
3 6.076211 5.356114 5.605134 5.443002 5.296778
4 7.009623 5.275595 4.801874 4.355892 6.752737
5 5.002059 6.163398 6.063694 2.409702 6.172111
6 6.298305 3.291884 5.737053 4.701320 4.752406
7 4.856246 4.674743 5.550828 7.501786 5.466611
8 5.037990 4.129333 4.797334 5.143915 5.558161
9 4.903592 3.135622 5.879798 5.639893 4.368915
10 5.500374 4.400130 3.980433 6.203259 4.498614
Now I want only the rows where the values of column a or column b are among the top 2 values in their row. So in this example deleting row 7-9, giving:
a b c d e
1 6.401462 5.318849 5.373496 5.101140 3.710973
2 6.715845 4.786936 3.521965 4.264029 4.525138
3 6.076211 5.356114 5.605134 5.443002 5.296778
4 7.009623 5.275595 4.801874 4.355892 6.752737
5 5.002059 6.163398 6.063694 2.409702 6.172111
6 6.298305 3.291884 5.737053 4.701320 4.752406
10 5.500374 4.400130 3.980433 6.203259 4.498614
Any ideas?
We can loop through the rows with apply
(from base R
) check whether any
of the elements in 'a' or 'b' are %in%
the sort
ed group to create a logical index and subset the rows based on that
i1 <- apply(my.df, 1, function(x) any(x[1:2] %in% sort(x, decreasing = TRUE)[1:2]))
my.df[i1,]
# a b c d e
#1 6.401462 5.318849 5.373496 5.101140 3.710973
#2 6.715845 4.786936 3.521965 4.264029 4.525138
#3 6.076211 5.356114 5.605134 5.443002 5.296778
#4 7.009623 5.275595 4.801874 4.355892 6.752737
#5 5.002059 6.163398 6.063694 2.409702 6.172111
#6 6.298305 3.291884 5.737053 4.701320 4.752406
#10 5.500374 4.400130 3.980433 6.203259 4.498614
Or use max.col
from base R
to create the logical index and that would be much faster and avoid any transformation
i1 <- max.col(my.df, "first")
i2 <- max.col(replace(my.df, cbind(seq_len(nrow(my.df)), i1), -Inf), "first")
my.df[(i1 %in% 1:2) | (i2 %in% 1:2), ]
my.df <- structure(list(a = c(6.401462, 6.715845, 6.076211, 7.009623,
5.002059, 6.298305, 4.856246, 5.03799, 4.903592, 5.500374), b = c(5.318849,
4.786936, 5.356114, 5.275595, 6.163398, 3.291884, 4.674743, 4.129333,
3.135622, 4.40013), c = c(5.373496, 3.521965, 5.605134, 4.801874,
6.063694, 5.737053, 5.550828, 4.797334, 5.879798, 3.980433),
d = c(5.10114, 4.264029, 5.443002, 4.355892, 2.409702, 4.70132,
7.501786, 5.143915, 5.639893, 6.203259), e = c(3.710973,
4.525138, 5.296778, 6.752737, 6.172111, 4.752406, 5.466611,
5.558161, 4.368915, 4.498614)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With