Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find top n% of records in a column of a dataframe using R

Tags:

dataframe

r

I have a dataset showing the exchange rate of the Australian Dollar versus the US dollar once a day over a period of about 20 years. I have the data in a data frame, with the first column being the date, and the second column being the exchange rate. Here's a sample from the data:

>data              V1     V2 1    12/12/1983 0.9175 2    13/12/1983 0.9010 3    14/12/1983 0.9000 4    15/12/1983 0.8978 5    16/12/1983 0.8928 6    19/12/1983 0.8770 7    20/12/1983 0.8795 8    21/12/1983 0.8905 9    22/12/1983 0.9005 10   23/12/1983 0.9005 

How would I go about displaying the top n% of these records? E.g. say I want to see the days and exchange rates for those days where the exchange rate falls in the top 5% of all exchange rates in the dataset?

like image 926
Bryce Thomas Avatar asked Oct 14 '09 02:10

Bryce Thomas


People also ask

How do you find the most frequent value in a column in R?

To find the most frequent factor value in an R data frame column, we can use names function with which. max function after creating the table for the particular column. This might be required while doing factorial analysis and we want to know which factor occurs the most.

How do I find the highest value in a Dataframe in R?

max() in R The max() is a built-in R function that finds the maximum value of the vector or data frame. It takes the R object as an input and returns the maximum value out of it. To find the maximum value of vector elements, data frame, and columns, use the max() function.


1 Answers

For the top 5%:

n <- 5 data[data$V2 > quantile(data$V2,prob=1-n/100),] 
like image 99
Rob Hyndman Avatar answered Sep 21 '22 15:09

Rob Hyndman