Relative frequencies / proportions with dplyr

People also ask

How does Dplyr calculate frequency?

Count the Relative Frequency of Factor Levels using dplyrUsing the n() function we got the number of observations of each value.

How do you calculate relative frequency in R?

The table() function calculates the frequency of each individual data value and the length() function calculates the total number of values in the dataset. Thus, dividing each individual frequency by the length of the dataset gives us the relative frequency of each value in the dataset.

Try this:

mtcars %>%
  group_by(am, gear) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n))

#   am gear  n      freq
# 1  0    3 15 0.7894737
# 2  0    4  4 0.2105263
# 3  1    4  8 0.6153846
# 4  1    5  5 0.3846154

From the dplyr vignette:

When you group by multiple variables, each summary peels off one level of the grouping. That makes it easy to progressively roll-up a dataset.

Thus, after the summarise, the last grouping variable specified in group_by, 'gear', is peeled off. In the mutate step, the data is grouped by the remaining grouping variable(s), here 'am'. You may check grouping in each step with groups.

The outcome of the peeling is of course dependent of the order of the grouping variables in the group_by call. You may wish to do a subsequent group_by(am), to make your code more explicit.

For rounding and prettification, please refer to the nice answer by @Tyler Rinker.

You can use count() function, which has however a different behaviour depending on the version of dplyr:

dplyr 0.7.1: returns an ungrouped table: you need to group again by am
dplyr < 0.7.1: returns a grouped table, so no need to group again, although you might want to ungroup() for later manipulations

dplyr 0.7.1

mtcars %>%
  count(am, gear) %>%
  group_by(am) %>%
  mutate(freq = n / sum(n))

dplyr < 0.7.1

mtcars %>%
  count(am, gear) %>%
  mutate(freq = n / sum(n))

This results into a grouped table, if you want to use it for further analysis, it might be useful to remove the grouped attribute with ungroup().

@Henrik's is better for usability as this will make the column character and no longer numeric but matches what you asked for...

mtcars %>%
  group_by (am, gear) %>%
  summarise (n=n()) %>%
  mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%"))

##   am gear  n rel.freq
## 1  0    3 15      79%
## 2  0    4  4      21%
## 3  1    4  8      62%
## 4  1    5  5      38%

EDIT Because Spacedman asked for it :-)

as.rel_freq <- function(x, rel_freq_col = "rel.freq", ...) {
    class(x) <- c("rel_freq", class(x))
    attributes(x)[["rel_freq_col"]] <- rel_freq_col
    x
}

print.rel_freq <- function(x, ...) {
    freq_col <- attributes(x)[["rel_freq_col"]]
    x[[freq_col]] <- paste0(round(100 * x[[freq_col]], 0), "%")   
    class(x) <- class(x)[!class(x)%in% "rel_freq"]
    print(x)
}

mtcars %>%
  group_by (am, gear) %>%
  summarise (n=n()) %>%
  mutate(rel.freq = n/sum(n)) %>%
  as.rel_freq()

## Source: local data frame [4 x 4]
## Groups: am
## 
##   am gear  n rel.freq
## 1  0    3 15      79%
## 2  0    4  4      21%
## 3  1    4  8      62%
## 4  1    5  5      38%

I wrote a small function for this repeating task:

count_pct <- function(df) {
  return(
    df %>%
      tally %>% 
      mutate(n_pct = 100*n/sum(n))
  )
}

I can then use it like:

mtcars %>% 
  group_by(cyl) %>% 
  count_pct

It returns:

# A tibble: 3 x 3
    cyl     n n_pct
  <dbl> <int> <dbl>
1     4    11  34.4
2     6     7  21.9
3     8    14  43.8

Despite the many answers, one more approach which uses prop.table in combination with dplyr or data.table.

library("dplyr")
mtcars %>%
    group_by(am, gear) %>%
    summarise(n = n()) %>%
    mutate(freq = prop.table(n))

library("data.table")
cars_dt <- as.data.table(mtcars)
cars_dt[, .(n = .N), keyby = .(am, gear)][, freq := prop.table(n) , by = "am"]

Related questions
                            
                                How to find common elements from multiple vectors?
                            
                                Annotating text on individual facet in ggplot2
                            
                                For each row in an R dataframe
                            
                                Convert row names into first column
                            
                                How to combine multiple conditions to subset a data-frame using "OR"?
                            
                                Show percent % instead of counts in charts of categorical variables
                            
                                How to split data into training/testing sets using sample function
                            
                                Remove duplicated rows
                            
                                Capitalize the first letter of both words in a two word string
                            
                                Workflow for statistical analysis and report writing
                            
                                Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2
                            
                                Determine the data types of a data frame's columns
                            
                                How to convert a table to a data frame
                            
                                session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium
                            
                                Access lapply index names inside FUN
                            
                                Does the ternary operator exist in R?
                            
                                Convert data.frame column to a vector?
                            
                                Change size of axes title and labels in ggplot2
                            
                                Fixing a multiple warning "unknown column"
                            
                                Error: could not find function ... in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Relative frequencies / proportions with dplyr

Tags:

r

group-by

dplyr

frequency

People also ask

Recent Activity

Donate For Us