frequency table with several variables in R

Tags:

I am trying to replicate a table often used in official statistics but no success so far. Given a dataframe like this one:

d1 <- data.frame( StudentID = c("x1", "x10", "x2", 
                          "x3", "x4", "x5", "x6", "x7", "x8", "x9"),
             StudentGender = c('F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'),
             ExamenYear    = c('2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'),
             Exam          = c('algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'),
             participated  = c('no','yes','yes','yes','no','yes','yes','yes','yes','yes'),  
             passed      = c('no','yes','yes','yes','no','yes','yes','yes','no','yes'),
             stringsAsFactors = FALSE)

I would like to create a table showing PER YEAR , the number of all students (all) and those who are female, those who participated and those who passed. Please note "ofwhich" below refers to all students.

A table I have in mind would look like that:

cbind(All = table(d1$ExamenYear),
  participated      = table(d1$ExamenYear, d1$participated)[,2],
  ofwhichFemale     = table(d1$ExamenYear, d1$StudentGender)[,1],
  ofwhichpassed     = table(d1$ExamenYear, d1$passed)[,2])

I am sure there is a better way to this kind of thing in R.

Note: I have seen LaTex solutions, but I am not use this will work for me as I need to export the table in Excel .

Thanks in advance

455

asked Aug 07 '12 19:08

user1043144

4 Answers

Using plyr:

require(plyr)
ddply(d1, .(ExamenYear), summarize,
      All=length(ExamenYear),
      participated=sum(participated=="yes"),
      ofwhichFemale=sum(StudentGender=="F"),
      ofWhichPassed=sum(passed=="yes"))

Which gives:

  ExamenYear All participated ofwhichFemale ofWhichPassed
1       2007   3            2             2             2
2       2008   4            3             2             3
3       2009   3            3             0             2

answered Sep 29 '22 12:09

Andy

The plyr package is great for this sort of thing. First load the package

library(plyr)

Then we use the ddply function:

ddply(d1, "ExamenYear", summarise, 
      All = length(passed),##We can use any column for this statistics
      participated = sum(participated=="yes"),
      ofwhichFemale = sum(StudentGender=="F"),
      ofwhichpassed = sum(passed=="yes"))

Basically, ddply expects a dataframe as input and returns a data frame. We then split up the input data frame by ExamenYear. On each sub table we calculate a few summary statistics. Notice that in ddply, we don't have to use the $ notation when referring to columns.

answered Sep 29 '22 12:09

csgillespie

There could have been a couple of modifications (use with to reduce the number of df$ calls and use character indices to improve self-documentation) to your code that would have made it easier to read and a worthy competitor to the ddply solutions:

with( d1, cbind(All = table(ExamenYear),
  participated      = table(ExamenYear, participated)[,"yes"],
  ofwhichFemale     = table(ExamenYear, StudentGender)[,"F"],
  ofwhichpassed     = table(ExamenYear, passed)[,"yes"])
     )

     All participated ofwhichFemale ofwhichpassed
2007   3            2             2             2
2008   4            3             2             3
2009   3            3             0             2

I would expect this to be much faster than the ddply solution, although that will only be apparent if you are working on larger datasets.

answered Sep 29 '22 13:09

IRTFM

You may also want to take a look of the plyr's next iterator: dplyr

It uses a ggplot-like syntax and provide fast performance by writing key pieces in C++.

d1 %.% 
group_by(ExamenYear) %.%    
summarise(ALL=length(ExamenYear),
          participated=sum(participated=="yes"),
          ofwhichFemale=sum(StudentGender=="F"),
          ofWhichPassed=sum(passed=="yes"))

answered Sep 29 '22 12:09

Randy Lai

Related questions
                            
                                Count consecutive elements in a same length vector
                            
                                How to cope with a singular fit in a linear mixed model (lme4)?
                            
                                I am trying to install openssl package in R using Ubuntu 18.04 without success
                            
                                Can you use dplyr across() to iterate across pairs of columns?
                            
                                Calculate within and between variances and confidence intervals in R
                            
                                GLM with autoregressive term to correct for serial correlation
                            
                                Adding trend lines/boxplots (by group) in ggplot2
                            
                                Levels in R Dataframe
                            
                                S4 missing or NULL arguments to methods?
                            
                                multiple histograms on top of eachother without bins
                            
                                Write a function to remove object if it exists
                            
                                Chi Square Analysis using for loop in R
                            
                                How to factorize specific columns in a data.frame in R using apply
                            
                                "Cannot open the connection" - HPC in R with snow
                            
                                Efficiently locate group-wise constant columns in a data.frame
                            
                                R Regular Expression Lookbehind
                            
                                How do I retrieve a simple numeric value from a named numeric vector in R?
                            
                                R - Speeding up approximate date match. idata.frame?
                            
                                how to write micrometer squared per cubic meter in plot label in R
                            
                                geom_vline with Character xintercept

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

frequency table with several variables in R

Tags:

r

aggregate

frequency

user1043144

People also ask

4 Answers

Andy

csgillespie

IRTFM

Randy Lai

Recent Activity

Donate For Us