I am assuming this is an easy thing but I am unable to solve my problem.
I have a data frame with 9 columns and I want to get the highest 3 values of the 4th column (LumenLenght) sorted for each group given in the first column.
I would like to be able to: a) find the 10 rows with the highest values for column 4 separated for each SampleID (first column) b) average the 10 values for each SampleID
My current code a) sorts first the values according to SampleID and LumenLength and b) separates the highest, second highest and third highest LumenLength value per SampleID.
sorted.v= arrange(sorted.v, desc(SampleId), LumenLength)
maxlength1 = aggregate(sorted.v$LumenLength,by = list(sorted.v$SampleId), FUN = tail, n = 1)#highest value
maxlength2 = aggregate(sorted.v$LumenLength,by = list(sorted.v$SampleId), FUN = tail, n = 2)#second highest value
maxlength3 = aggregate(sorted.v$LumenLength,by = list(sorted.v$SampleId), FUN = tail, n = 3)#3. highest value
As you can see, I haven't really reached my goal jet. I am also pretty sure there is a better way of doing it but I stuck right now.
We can use top_n
from dplyr
even without arrange
ing the dataset.
sorted.v %>%
group_by(SampleId) %>%
top_n(10, LumenLength) %>%
summmarise(MeanLumenArea = mean(LumenLength))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With