I have a dataset containing 100000 rows of data. I tried to do some countif
operations in Excel, but it was prohibitively slow. So I am wondering if this kind of operation can be done in R? Basically, I want to do a count based on multiple conditions. For example, I can count on both occupation and sex
row sex occupation
1 M Student
2 F Analyst
2 M Analyst
Easy peasy. Your data frame will look like this:
df <- data.frame(sex=c('M','F','M'),
occupation=c('Student','Analyst','Analyst'))
You can then do the equivalent of a COUNTIF
by first specifying the IF
part, like so:
df$sex == 'M'
This will give you a boolean vector, i.e. a vector of TRUE
and FALSE
. What you want is to count the observations for which the condition is TRUE
. Since in R TRUE
and FALSE
double as 1 and 0 you can simply sum()
over the boolean vector. The equivalent of COUNTIF(sex='M')
is therefore
sum(df$sex == 'M')
Should there be rows in which the sex
is not specified the above will give back NA
. In that case, if you just want to ignore the missing observations use
sum(df$sex == 'M', na.rm=TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With