Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I count the occurrences of a factor in several columns, grouping by one column?

Tags:

r

aggregate

I have a seemingly simple question, but I cannot figure out how to get exactly what I want.

My data looks like this:

      Job     C/C++     Java     Python
  Student     FALSE     TRUE      FALSE
Developer      TRUE     TRUE       TRUE
Developer      TRUE     TRUE      FALSE
 Sysadmin      TRUE    FALSE      FALSE
  Student     FALSE     TRUE       TRUE

I would like to group by the "Job" column and count the number of TRUEs in each column. My desired output would look like this:

      Job     C/C++     Java     Python
  Student         0        2          1
Developer         2        2          1 
 Sysadmin         1        0          0

Any help would be greatly appreciated.

like image 471
user2145843 Avatar asked Mar 07 '13 19:03

user2145843


People also ask

How to count the number of occurrences in a column?

How To Count The Number Of Occurrences In A Column The process of counting the number of occurrences is similar to the count function in Excel. You give it a range to check and it gives the number of occurrences. In this case, it is a data frame for that range.

How to count number of occurrences and salary in Excel?

Customer Name and City are columns of text values and Salary for numbers values. This relation and data set are for practice purposes only. 1. Using COUNTIF function 2. Using SUM-EXACT functions 3. Using COUNT-IF functions 4. SUM-IF functions to count Number of Occurrences 5. Pivot Table

How do you count occurrences in a pivot table?

Pivot Table You can use the Pivot Table for counting the number of occurrences for each value within the column. Before using the Pivot Table you need to make your table is Format as Table. To do so, select all the entire table and explore the Home tab, you will find Format as Table option.

What is the significance of counting the occurrence of distinct values?

When counting the occurrence of distinct values, it gives you new information about the data set. Furthermore, when you count occurances among multiple columns it can show relationships between columns that you would not see simply by looking at the raw numbers. Finding these relationships can have a big impact on how you view information.


2 Answers

Assuming your data.frame is called "temp", just use aggregate:

aggregate(. ~ Job, temp, sum)
#         Job C.C.. Java Python
# 1 Developer     2    2      1
# 2   Student     0    2      1
# 3  Sysadmin     1    0      0

The logic is that TRUE and FALSE equate to numeric values of "1" and "0", so you can simply use sum when aggregating.


And, to add the "tidyverse" solution for completeness:

library(tidyverse)
temp %>% 
  group_by(Job) %>% 
  summarise_all(sum)
# # A tibble: 3 x 4
#   Job       C.C..  Java Python
#   <chr>     <int> <int>  <int>
# 1 Developer     2     2      1
# 2 Student       0     2      1
# 3 Sysadmin      1     0      0

Here's your data in a format that is easy to copy-and-paste. This was obtained by using dput(your-actual-data-frame-name) and is what you should use in the future when posting R questions to Stack Overflow.

temp <- structure(list(Job = c("Student", "Developer", "Developer", "Sysadmin", 
          "Student"), C.C.. = c(FALSE, TRUE, TRUE, TRUE, FALSE), Java = c(TRUE, 
          TRUE, TRUE, FALSE, TRUE), Python = c(FALSE, TRUE, FALSE, FALSE, TRUE)),
          .Names = c("Job", "C.C..", "Java", "Python"), class = "data.frame", 
          row.names = c(NA, -5L))
like image 106
A5C1D2H2I1M1N2O1R2T1 Avatar answered Nov 29 '22 22:11

A5C1D2H2I1M1N2O1R2T1


Alternative plyr and data.table solutions:

data.table:

require(data.table)
tmp.dt <- data.table(temp, key="Job")
tmp.dt[, lapply(.SD, sum), by=Job]

#         Job C.C.. Java Python
# 1: Developer     2    2      1
# 2:   Student     0    2      1
# 3:  Sysadmin     1    0      0

plyr:

require(plyr)
ddply(temp, .(Job), function(x) colSums(x[-1]))

#         Job C.C.. Java Python
# 1 Developer     2    2      1
# 2   Student     0    2      1
# 3  Sysadmin     1    0      0

Edit: If instead of TRUE/FALSE, you've to count the number of Newbie's, then:

With data.table:

require(data.table)
tmp.dt <- data.table(temp, key="Job")
tmp.dt[, lapply(.SD, function(x) sum(x == "Newbie")), by=Job]

With plyr:

require(plyr)
ddply(temp, .(Job), function(x) colSums(x[-1] == "Newbie"))
like image 23
Arun Avatar answered Nov 29 '22 22:11

Arun