I have a seemingly simple question, but I cannot figure out how to get exactly what I want. My data looks like this: <pre class="prettyprint"><code> Job C/C++ Java Python Student FALSE TRUE FALSE Developer TRUE TRUE TRUE Developer TRUE TRUE FALSE Sysadmin TRUE FALSE FALSE Student FALSE TRUE TRUE </code></pre> I would like to group by the "Job" column and count the number of <code>TRUE</code>s in each column. My desired output would look like this: <pre class="prettyprint"><code> Job C/C++ Java Python Student 0 2 1 Developer 2 2 1 Sysadmin 1 0 0 </code></pre> Any help would be greatly appreciated.

Assuming your data.frame is called "temp", just use <code>aggregate</code>: <pre class="prettyprint"><code>aggregate(. ~ Job, temp, sum) # Job C.C.. Java Python # 1 Developer 2 2 1 # 2 Student 0 2 1 # 3 Sysadmin 1 0 0 </code></pre> <hr> The logic is that <code>TRUE</code> and <code>FALSE</code> equate to numeric values of "1" and "0", so you can simply use <code>sum</code> when aggregating. <hr> And, to add the "tidyverse" solution for completeness: <pre class="prettyprint"><code>library(tidyverse) temp %>% group_by(Job) %>% summarise_all(sum) # # A tibble: 3 x 4 # Job C.C.. Java Python # <chr> <int> <int> <int> # 1 Developer 2 2 1 # 2 Student 0 2 1 # 3 Sysadmin 1 0 0 </code></pre> <hr> Here's your data in a format that is easy to copy-and-paste. This was obtained by using <code>dput(your-actual-data-frame-name)</code> and is what you should use in the future when posting R questions to Stack Overflow. <pre class="prettyprint"><code>temp <- structure(list(Job = c("Student", "Developer", "Developer", "Sysadmin", "Student"), C.C.. = c(FALSE, TRUE, TRUE, TRUE, FALSE), Java = c(TRUE, TRUE, TRUE, FALSE, TRUE), Python = c(FALSE, TRUE, FALSE, FALSE, TRUE)), .Names = c("Job", "C.C..", "Java", "Python"), class = "data.frame", row.names = c(NA, -5L)) </code></pre>

How do I count the occurrences of a factor in several columns, grouping by one column?

Tags:

r

aggregate

I have a seemingly simple question, but I cannot figure out how to get exactly what I want.

My data looks like this:

      Job     C/C++     Java     Python
  Student     FALSE     TRUE      FALSE
Developer      TRUE     TRUE       TRUE
Developer      TRUE     TRUE      FALSE
 Sysadmin      TRUE    FALSE      FALSE
  Student     FALSE     TRUE       TRUE

I would like to group by the "Job" column and count the number of TRUEs in each column. My desired output would look like this:

      Job     C/C++     Java     Python
  Student         0        2          1
Developer         2        2          1 
 Sysadmin         1        0          0

Any help would be greatly appreciated.

471

asked Mar 07 '13 19:03

user2145843

2 Answers

Assuming your data.frame is called "temp", just use aggregate:

aggregate(. ~ Job, temp, sum)
#         Job C.C.. Java Python
# 1 Developer     2    2      1
# 2   Student     0    2      1
# 3  Sysadmin     1    0      0

The logic is that TRUE and FALSE equate to numeric values of "1" and "0", so you can simply use sum when aggregating.

And, to add the "tidyverse" solution for completeness:

library(tidyverse)
temp %>% 
  group_by(Job) %>% 
  summarise_all(sum)
# # A tibble: 3 x 4
#   Job       C.C..  Java Python
#   <chr>     <int> <int>  <int>
# 1 Developer     2     2      1
# 2 Student       0     2      1
# 3 Sysadmin      1     0      0

Here's your data in a format that is easy to copy-and-paste. This was obtained by using dput(your-actual-data-frame-name) and is what you should use in the future when posting R questions to Stack Overflow.

temp <- structure(list(Job = c("Student", "Developer", "Developer", "Sysadmin", 
          "Student"), C.C.. = c(FALSE, TRUE, TRUE, TRUE, FALSE), Java = c(TRUE, 
          TRUE, TRUE, FALSE, TRUE), Python = c(FALSE, TRUE, FALSE, FALSE, TRUE)),
          .Names = c("Job", "C.C..", "Java", "Python"), class = "data.frame", 
          row.names = c(NA, -5L))

106

answered Nov 29 '22 22:11

A5C1D2H2I1M1N2O1R2T1

Alternative plyr and data.table solutions:

data.table:

require(data.table)
tmp.dt <- data.table(temp, key="Job")
tmp.dt[, lapply(.SD, sum), by=Job]

#         Job C.C.. Java Python
# 1: Developer     2    2      1
# 2:   Student     0    2      1
# 3:  Sysadmin     1    0      0

plyr:

require(plyr)
ddply(temp, .(Job), function(x) colSums(x[-1]))

#         Job C.C.. Java Python
# 1 Developer     2    2      1
# 2   Student     0    2      1
# 3  Sysadmin     1    0      0

Edit: If instead of TRUE/FALSE, you've to count the number of Newbie's, then:

With data.table:

require(data.table)
tmp.dt <- data.table(temp, key="Job")
tmp.dt[, lapply(.SD, function(x) sum(x == "Newbie")), by=Job]

With plyr:

require(plyr)
ddply(temp, .(Job), function(x) colSums(x[-1] == "Newbie"))

answered Nov 29 '22 22:11

Arun

Related questions
                            
                                Step by step procedure on how to run nested logistic regression in R
                            
                                How To Filter a Dataframe based on Category Counts
                            
                                Automatically determine position of plot legend
                            
                                Dealing with durations defined by days, hours, minutes and seconds such as "1d 3h 2m 28s" in R
                            
                                column names have periods inserted where there should be spaces
                            
                                R: cannot predict specific value [duplicate]
                            
                                Merging two data frames in R that have common and uncommon samples
                            
                                Download a file into my working directory
                            
                                find highest value within factor levels
                            
                                Reshape in the middle
                            
                                comparing values in a row
                            
                                Jensen Shannon divergence in R
                            
                                Function within Function in R
                            
                                Hmisc::latex not printing caption w/ tabular object
                            
                                Using predict to find values of non-linear model
                            
                                "unpacking" a factor list from a data.frame
                            
                                Difference between R-Cran and R-Forge project? [closed]
                            
                                How to replace the text inside an XML element in R?
                            
                                extract last row for each subject from a data frame
                            
                                Add a countdown column to data.table containing rows until a special row encountered

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With