I premise I'm new with R and actually I'm trying to get the fundamentals. Currently I'm workin on a large dataframe (called "ppl") which I have to edit in order to filter some rows. Each row is included in a group and it is characterized by an intensity (into) value and a sample value.
mz rt into sample tracker sn grp
100.0153 126 2.762664 3 11908 7.522655 0
100.0171 127 2.972048 2 5308 7.718521 0
100.0788 272 30.217969 2 5309 19.024807 1
100.0796 272 17.277916 3 11910 7.297716 1
101.0042 128 37.557324 3 11916 27.991320 2
101.0043 128 39.676014 2 5316 28.234918 2
Well, the first question is: "How can I select from each group the sample with the highest intensity?" I tried a for loop:
for (i in ppl$grp) {
temp<-ppl[ppl$grp == i,]
sel<-rbind(sel,temp[max(temp$into),])
}
The fact is that it works for ppl$grp == 0, but the next cycles return NAs rows. Then the filtered dataframe(called "sel") also should store the sample values of the removed rows. It should be as follows:
mz rt into sample tracker sn grp
100.0171 127 2.972048 c(2,3) 5308 7.718521 0
100.0788 272 30.217969 c(2,3) 5309 19.024807 1
101.0043 128 39.676014 c(2,3) 5316 28.234918 2
In order to get this I would use this approach:
lev<-factor(ppl$grp)
samp<-ppl$sample
samp2<-split(samp,lev)
sel$sample<-samp2
Any hint? Because I cannot test it since I still don't have solved the previous problem.
Thanks a lot.
Not sure if I follow your question. But maybe this will get you started.
library(dplyr)
ppl %>% group_by(grp) %>% filter(into == max(into))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With