I have a seemingly simple question, but I cannot figure out how to get exactly what I want.
My data looks like this:
Job C/C++ Java Python
Student FALSE TRUE FALSE
Developer TRUE TRUE TRUE
Developer TRUE TRUE FALSE
Sysadmin TRUE FALSE FALSE
Student FALSE TRUE TRUE
I would like to group by the "Job" column and count the number of TRUE
s in each column. My desired output would look like this:
Job C/C++ Java Python
Student 0 2 1
Developer 2 2 1
Sysadmin 1 0 0
Any help would be greatly appreciated.
How To Count The Number Of Occurrences In A Column The process of counting the number of occurrences is similar to the count function in Excel. You give it a range to check and it gives the number of occurrences. In this case, it is a data frame for that range.
Customer Name and City are columns of text values and Salary for numbers values. This relation and data set are for practice purposes only. 1. Using COUNTIF function 2. Using SUM-EXACT functions 3. Using COUNT-IF functions 4. SUM-IF functions to count Number of Occurrences 5. Pivot Table
Pivot Table You can use the Pivot Table for counting the number of occurrences for each value within the column. Before using the Pivot Table you need to make your table is Format as Table. To do so, select all the entire table and explore the Home tab, you will find Format as Table option.
When counting the occurrence of distinct values, it gives you new information about the data set. Furthermore, when you count occurances among multiple columns it can show relationships between columns that you would not see simply by looking at the raw numbers. Finding these relationships can have a big impact on how you view information.
Assuming your data.frame is called "temp", just use aggregate
:
aggregate(. ~ Job, temp, sum)
# Job C.C.. Java Python
# 1 Developer 2 2 1
# 2 Student 0 2 1
# 3 Sysadmin 1 0 0
The logic is that TRUE
and FALSE
equate to numeric values of "1" and "0", so you can simply use sum
when aggregating.
And, to add the "tidyverse" solution for completeness:
library(tidyverse)
temp %>%
group_by(Job) %>%
summarise_all(sum)
# # A tibble: 3 x 4
# Job C.C.. Java Python
# <chr> <int> <int> <int>
# 1 Developer 2 2 1
# 2 Student 0 2 1
# 3 Sysadmin 1 0 0
Here's your data in a format that is easy to copy-and-paste. This was obtained by using dput(your-actual-data-frame-name)
and is what you should use in the future when posting R questions to Stack Overflow.
temp <- structure(list(Job = c("Student", "Developer", "Developer", "Sysadmin",
"Student"), C.C.. = c(FALSE, TRUE, TRUE, TRUE, FALSE), Java = c(TRUE,
TRUE, TRUE, FALSE, TRUE), Python = c(FALSE, TRUE, FALSE, FALSE, TRUE)),
.Names = c("Job", "C.C..", "Java", "Python"), class = "data.frame",
row.names = c(NA, -5L))
Alternative plyr
and data.table
solutions:
data.table:
require(data.table)
tmp.dt <- data.table(temp, key="Job")
tmp.dt[, lapply(.SD, sum), by=Job]
# Job C.C.. Java Python
# 1: Developer 2 2 1
# 2: Student 0 2 1
# 3: Sysadmin 1 0 0
plyr:
require(plyr)
ddply(temp, .(Job), function(x) colSums(x[-1]))
# Job C.C.. Java Python
# 1 Developer 2 2 1
# 2 Student 0 2 1
# 3 Sysadmin 1 0 0
Edit: If instead of TRUE/FALSE, you've to count the number of Newbie
's, then:
With data.table:
require(data.table)
tmp.dt <- data.table(temp, key="Job")
tmp.dt[, lapply(.SD, function(x) sum(x == "Newbie")), by=Job]
With plyr:
require(plyr)
ddply(temp, .(Job), function(x) colSums(x[-1] == "Newbie"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With