I want to count the number of occurrences of a factor in a data frame. For example, to count the number of events of a given type in the code below:
library(plyr)
events <- data.frame(type = c('A', 'A', 'B'),
quantity = c(1, 2, 1))
ddply(events, .(type), summarise, quantity = sum(quantity))
The output is the following:
type quantity
1 A 3
2 B 1
However, what if I know that there are three types of events A
, B
and C
, and I also want to see the count for C
which is 0
? In other words, I want the output to be:
type quantity
1 A 3
2 B 1
3 C 0
How do I do this? It feels like there should be a function defined to do this somewhere.
The following are my two not-so-good ideas about how to go about this.
Idea #1: I know I could do this by using a for
loop, but I know that it is widely said that if you are using a for
loop in R
, then you are doing something wrong, there must be a better way to do it.
Idea #2: Add dummy entries to the original data frame. This solution works but it feels like there should be a more elegant solution.
events <- data.frame(type = c('A', 'A', 'B'),
quantity = c(1, 2, 1))
events <- rbind(events, data.frame(type = 'C', quantity = 0))
ddply(events, .(type), summarise, quantity = sum(quantity))
Method 1 : Using summary() method The summary() function produces an output of the frequencies of the values per level of the given factor column of the data frame in R. A summary statistics for each of the variables of this column is result in a tabular format, as an output.
tabulate() function in R Language is used to count the frequency of occurrence of a element in the vector. This function checks for each element in the vector and returns the number of times it occurs in the vector. It will create a vector of the length of the maximum element present in the vector.
count conditionally in R You can use base R to create conditions and count the number of occurrences in a column. If you are an Excel user, it is similar to function COUNTIF.
You get this for free if you define your events
variable correctly as a factor with the desired three levels:
R> events <- data.frame(type = factor(c('A', 'A', 'B'), c('A','B','C')),
+ quantity = c(1, 2, 1))
R> events
type quantity
1 A 1
2 A 2
3 B 1
R> table(events$type)
A B C
2 1 0
R>
Simply calling table()
on the factor already does the right thing, and ddply()
can too
if you tell it not to drop
:
R> ddply(events, .(type), summarise, quantity = sum(quantity), .drop=FALSE)
type quantity
1 A 3
2 B 1
3 C 0
R>
> xtabs(quantity~type, events)
type
A B C
3 1 0
Using dplyr library
library(dplyr)
data <- data.frame(level = c('A', 'A', 'B', 'B', 'B', 'C'),
value = c(1:6))
data %>%
group_by(level) %>%
summarize(count = n()) %>%
View
If you choose also to perform mean, min, max operations, try this
data %>%
group_by(level) %>%
summarise(count = n(), Max_val = max(value), Min_val = min(value)) %>%
View
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With