Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the 10 highest values for each group in a data frame?

I am assuming this is an easy thing but I am unable to solve my problem.

I have a data frame with 9 columns and I want to get the highest 3 values of the 4th column (LumenLenght) sorted for each group given in the first column.

I would like to be able to: a) find the 10 rows with the highest values for column 4 separated for each SampleID (first column) b) average the 10 values for each SampleID

data frame

My current code a) sorts first the values according to SampleID and LumenLength and b) separates the highest, second highest and third highest LumenLength value per SampleID.

sorted.v= arrange(sorted.v, desc(SampleId), LumenLength)
maxlength1 = aggregate(sorted.v$LumenLength,by = list(sorted.v$SampleId),  FUN = tail, n = 1)#highest value
maxlength2 = aggregate(sorted.v$LumenLength,by = list(sorted.v$SampleId),  FUN = tail, n = 2)#second highest value
maxlength3 = aggregate(sorted.v$LumenLength,by = list(sorted.v$SampleId),  FUN = tail, n = 3)#3. highest value

As you can see, I haven't really reached my goal jet. I am also pretty sure there is a better way of doing it but I stuck right now.

like image 527
Carola Avatar asked Mar 13 '23 08:03

Carola


1 Answers

We can use top_n from dplyr even without arrange ing the dataset.

sorted.v %>%
    group_by(SampleId) %>%
    top_n(10, LumenLength) %>%
    summmarise(MeanLumenArea = mean(LumenLength))
like image 51
akrun Avatar answered May 01 '23 04:05

akrun