Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr: apply function table() to each column of a data.frame

Tags:

r

dplyr

plyr

Apply function table() to each column of a data.frame using dplyr

I often apply the table-function on each column of a data frame using plyr, like this:

library(plyr)
ldply( mtcars, function(x) data.frame( table(x), prop.table( table(x) ) )  )

Is it possible to do this in dplyr also?

My attempts fail:

mtcars %>%  do( table %>% data.frame() )
melt( mtcars ) %>%  do( table %>% data.frame() )
like image 234
Rasmus Larsen Avatar asked Dec 26 '14 17:12

Rasmus Larsen


People also ask

How do you apply a function to all columns in a Dataframe in R?

Apply any function to all R data frame You can set the MARGIN argument to c(1, 2) or, equivalently, to 1:2 to apply the function to each value of the data frame. If you set MARGIN = c(2, 1) instead of c(1, 2) the output will be the same matrix but transposed. The output is of class “matrix” instead of “data.

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

What does across () do in R?

across() returns a tibble with one column for each column in .


3 Answers

You can try the following which does not rely on the tidyr package.

mtcars %>% 
   lapply(table) %>% 
   lapply(as.data.frame) %>% 
   Map(cbind,var = names(mtcars),.) %>% 
   rbind_all() %>% 
   group_by(var) %>% 
   mutate(pct = Freq / sum(Freq))
like image 93
Caner Avatar answered Oct 05 '22 15:10

Caner


In general you probably would not want to run table() on every column of a data frame because at least one of the variables will be unique (an id field) and produce a very long output. However, you can use group_by() and tally() to obtain frequency tables in a dplyr chain. Or you can use count() which does the group_by() for you.

> mtcars %>% 
    group_by(cyl) %>% 
    tally()
> # mtcars %>% count(cyl)

Source: local data frame [3 x 2]

  cyl  n
1   4 11
2   6  7
3   8 14

If you want to do a two-way frequency table, group by more than one variable.

> mtcars %>% 
    group_by(gear, cyl) %>% 
    tally()
> # mtcars %>% count(gear, cyl)

You can use spread() of the tidyr package to turn that two-way output into the output one is used to receiving with table() when two variables are input.

like image 20
josiekre Avatar answered Oct 05 '22 14:10

josiekre


Using tidyverse (dplyr and purrr):

library(tidyverse)

mtcars %>%
    map( function(x) table(x) )

Or:

mtcars %>%
    map(~ table(.x) )

Or simply:

library(tidyverse)

mtcars %>%
    map( table )
like image 44
Rasmus Larsen Avatar answered Oct 05 '22 13:10

Rasmus Larsen