I'm quite new to R, I use it mainly for visualising statistics using ggplot2
library. Now I have faced a problem with data preparation.
I need to write a function, that will remove some number (2, 5 or 10) rows from a data frame that have highest and lowest values in specified column and put them into another data frame, and do this for each combination of two factors (in my case: for each day and server).
Up to this point, I have done the following steps (MWE using esoph
example dataset).
I have sorted the frame according to the desired parameter (ncontrols
in example):
esoph<-esoph[with(esoph,order(-ncontrols)) ,]
I can display first/last records for each factor value (in this example for each age range):
by(data=esoph,INDICES=esoph$agegp,FUN=head,3)
by(data=esoph,INDICES=esoph$agegp,FUN=tail,3)
So basically, I can see the highest and lowest values, but I don't know how to extract them into another data frame and how to remove them from the main one.
Also in the above example I can see top/bottom records for each value of one factor (age range), but in reality I need to know highest and lowest records for each value of two factors -- in this example they could be agegp
and alcgp
.
I am not even sure if these above steps are OK - perhaps using plyr
would work better? I'd appreciate any hints.
Yes, you can use plyr
as follows:
esoph <- data.frame(agegp = sample(letters[1:2], 20, replace = TRUE),
alcgp = sample(LETTERS[1:2], 20, replace = TRUE),
ncontrols = runif(20))
ddply(esoph, c("agegp", "alcgp"),
function(x){idx <- c(which.min(x$ncontrols),
which.max(x$ncontrols))
x[idx, , drop = FALSE]})
# agegp alcgp ncontrols
# 1 a A 0.03091483
# 2 a A 0.88529790
# 3 a B 0.51265447
# 4 a B 0.86111649
# 5 b A 0.28372232
# 6 b A 0.61698401
# 7 b B 0.05618841
# 8 b B 0.89346943
ddply(esoph, c("agegp", "alcgp"),
function(x){idx <- c(which.min(x$ncontrols),
which.max(x$ncontrols))
x[-idx, , drop = FALSE]})
# agegp alcgp ncontrols
# 1 a A 0.3745029
# 2 a B 0.7621474
# 3 a B 0.6319013
# 4 b A 0.3055078
# 5 b A 0.5146028
# 6 b B 0.3735615
# 7 b B 0.2528612
# 8 b B 0.4415205
# 9 b B 0.6868219
# 10 b B 0.3750102
# 11 b B 0.2279462
# 12 b B 0.1891052
There are possibly many alternatives, e.g. using head
and tail
if your data is already sorted, but this should work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With