I like to create a table that has the frequency of several columns in my data frame. I am copying part of my data frame below.
The table is supposed to have frequency (both n and %) of "red" in Color and "F" in Gender.
I think that the dplyr package could do this but I cannot figure it out.
Thank you-
RespondentID Color Gender 1 1503 Red F 2 1653 NA M 3 1982 Red F 4 4862 Red NA 15 4880 Blue M
To create a frequency table in R, we can simply use table function but the output of table function returns a horizontal table. If we want to read the table in data frame format then we would need to read the table as a data frame using as. data. frame function.
To create a frequency column for categorical variable in an R data frame, we can use the transform function by defining the length of categorical variable using ave function. The output will have the duplicated frequencies as one value in the categorical column is likely to be repeated.
library(dplyr) df %>% count(Color, Gender) %>% group_by(Color) %>% # now required with changes to dplyr::count() mutate(prop = prop.table(n)) # Source: local data frame [4 x 4] # Groups: Color [3] # # Color Gender n prop # (fctr) (fctr) (int) (dbl) # 1 Blue M 1 1.0000000 # 2 Red F 2 0.6666667 # 3 Red NA 1 0.3333333 # 4 NA M 1 1.0000000
Updating per comment -- if you want to look at each variable separately, you will need to rearrange the dataframe first. You can accomplish this with tidyr
:
library(tidyr) library(dplyr) gather(df, "var", "value", -RespondentID) %>% count(var, value) %>% group_by(var) %>% # now required with changes to dplyr::count() mutate(prop = prop.table(n)) # Source: local data frame [6 x 4] # Groups: var [2] # # var value n prop # (fctr) (chr) (int) (dbl) # 1 Color Blue 1 0.2 # 2 Color Red 3 0.6 # 3 Color NA 1 0.2 # 4 Gender F 2 0.4 # 5 Gender M 2 0.4 # 6 Gender NA 1 0.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With