A quick one for you, dearest R gurus: I'm doing an assignment and I've been asked, in this exercise, to get basic statistics out of the <code>infert</code> dataset (it's in-built), and specifically one of its columns, <code>infert$age</code>. For anyone not familiar with the dataset: <pre class="prettyprint"><code>> table_ages # Which is just subset(infert, select=c("age")); age 1 26 2 42 3 39 4 34 5 35 6 36 7 23 8 32 9 21 10 28 11 29 ... 246 35 247 29 248 23 </code></pre> I've had to find median values of the column, variance, skewness, standard deviation which were all okay, until I was asked to find the column "percentiles". I haven't been able to find anything so far, and maybe I've translated it incorrectly from greek, the language of the assignment. It was "ποσοστημόρια", Google Translate pointed the English term to be "percentiles". Any tutorials or ideas on finding those "percentiles" of <code>infert$age</code>?

Using {dplyr}: <pre class="prettyprint"><code>library(dplyr) # percentiles infert %>% mutate(PCT = ntile(age, 100)) # quartiles infert %>% mutate(PCT = ntile(age, 4)) # deciles infert %>% mutate(PCT = ntile(age, 10)) </code></pre>

Calculating percentile of dataset column

Tags:

r

statistics

percentile

A quick one for you, dearest R gurus:

I'm doing an assignment and I've been asked, in this exercise, to get basic statistics out of the infert dataset (it's in-built), and specifically one of its columns, infert$age.

For anyone not familiar with the dataset:

> table_ages     # Which is just subset(infert, select=c("age"));
    age
1    26
2    42
3    39
4    34
5    35
6    36
7    23
8    32
9    21
10   28
11   29
...
246  35
247  29
248  23

I've had to find median values of the column, variance, skewness, standard deviation which were all okay, until I was asked to find the column "percentiles".

I haven't been able to find anything so far, and maybe I've translated it incorrectly from greek, the language of the assignment. It was "ποσοστημόρια", Google Translate pointed the English term to be "percentiles".

Any tutorials or ideas on finding those "percentiles" of infert$age?

611

asked Jan 19 '14 16:01

Dimitris Sfounis

3 Answers

If you order a vector x, and find the values that is half way through the vector, you just found a median, or 50th percentile. Same logic applies for any percentage. Here are two examples.

x <- rnorm(100)
quantile(x, probs = c(0, 0.25, 0.5, 0.75, 1)) # quartile
quantile(x, probs = seq(0, 1, by= 0.1)) # decile

answered Oct 19 '22 20:10

Roman Luštrik

The quantile() function will do much of what you probably want, but since the question was ambiguous, I will provide an alternate answer that does something slightly different from quantile().

ecdf(infert$age)(infert$age)

will generate a vector of the same length as infert$age giving the proportion of infert$age that is below each observation. You can read the ecdf documentation, but the basic idea is that ecdf() will give you a function that returns the empirical cumulative distribution. Thus ecdf(X)(Y) is the value of the cumulative distribution of X at the points in Y. If you wanted to know just the probability of being below 30 (thus what percentile 30 is in the sample), you could say

ecdf(infert$age)(30)

The main difference between this approach and using the quantile() function is that quantile() requires that you put in the probabilities to get out the levels, and this requires that you put in the levels to get out the probabilities.

answered Oct 19 '22 22:10

randy

Using {dplyr}:

library(dplyr)

# percentiles
infert %>% 
  mutate(PCT = ntile(age, 100))

# quartiles
infert %>% 
  mutate(PCT = ntile(age, 4))

# deciles
infert %>% 
  mutate(PCT = ntile(age, 10))

answered Oct 19 '22 20:10

Gorka

Related questions
                            
                                Count number of rows matching a criteria
                            
                                How do I change the NA color from gray to white in a ggplot choropleth map?
                            
                                Find which season a particular date belongs to
                            
                                OS X package installation depends on gfortran-4.8
                            
                                How to remove repeated elements in a vector, similar to 'set' in Python
                            
                                What does %*% mean in R [duplicate]
                            
                                customize ggplot2 axis labels with different colors
                            
                                Merging multiple data.tables
                            
                                How to adjust the size of y axis labels only in R?
                            
                                How to check the OS within R [duplicate]
                            
                                Confusing error in R: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 42 elements) [duplicate]
                            
                                as.numeric with comma decimal separators?
                            
                                Unable to update R packages in default library on Windows 7
                            
                                Error converting text to lowercase with tm_map(..., tolower)
                            
                                How to initialize a vector with fixed length in R
                            
                                How to add whitespace to an RMarkdown document?
                            
                                How to debug "contrasts can be applied only to factors with 2 or more levels" error?
                            
                                Difference between subset and filter from dplyr
                            
                                Remove numbers from alphanumeric characters
                            
                                Get date difference in years (floating point)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With