I have a data.frame mydf
with about 2500 rows. These rows correspond to 69 classes of objects in colum 1 mydf$V1
, and I want to count how many rows per object class I have.
I can get a factor of these classes with:
objectclasses = unique(factor(mydf$V1, exclude="1"));
What's the terse R way to count the rows per object class? If this were any other language I'd be traversing an array with a loop and keeping count but I'm new to R programming and am trying to take advantage of R's vectorised operations.
Method 1 : Using summary() method The summary() function produces an output of the frequencies of the values per level of the given factor column of the data frame in R. A summary statistics for each of the variables of this column is result in a tabular format, as an output.
A factor must have at least two levels. If a factor only had one level then the effect of the factor could not be assessed.
nlevels() function in R Language is used to get the number of levels of a factor.
count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n()) .
Or using the dplyr
library:
library(dplyr)
set.seed(1)
dat <- data.frame(ID = sample(letters,100,rep=TRUE))
dat %>%
group_by(ID) %>%
summarise(no_rows = length(ID))
Note the use of %>%
, which is similar to the use of pipes in bash. Effectively, the code above pipes dat
into group_by
, and the result of that operation is piped into summarise
.
The result is:
Source: local data frame [26 x 2]
ID no_rows
1 a 2
2 b 3
3 c 3
4 d 3
5 e 2
6 f 4
7 g 6
8 h 1
9 i 6
10 j 5
11 k 6
12 l 4
13 m 7
14 n 2
15 o 2
16 p 2
17 q 5
18 r 4
19 s 5
20 t 3
21 u 8
22 v 4
23 w 5
24 x 4
25 y 3
26 z 1
See the dplyr
introduction for some more context, and the documentation for details regarding the individual functions.
Here 2 ways to do it:
set.seed(1)
tt <- sample(letters,100,rep=TRUE)
## using table
table(tt)
tt
a b c d e f g h i j k l m n o p q r s t u v w x y z
2 3 3 3 2 4 6 1 6 5 6 4 7 2 2 2 5 4 5 3 8 4 5 4 3 1
## using tapply
tapply(tt,tt,length)
a b c d e f g h i j k l m n o p q r s t u v w x y z
2 3 3 3 2 4 6 1 6 5 6 4 7 2 2 2 5 4 5 3 8 4 5 4 3 1
Using plyr
package:
library(plyr)
count(mydf$V1)
It will return you a frequency of each value.
Using data.table
library(data.table)
setDT(dat)[, .N, keyby=ID] #(Using @Paul Hiemstra's `dat`)
Or using dplyr 0.3
res <- count(dat, ID)
head(res)
#Source: local data frame [6 x 2]
# ID n
#1 a 2
#2 b 3
#3 c 3
#4 d 3
#5 e 2
#6 f 4
Or
dat %>%
group_by(ID) %>%
tally()
Or
dat %>%
group_by(ID) %>%
summarise(n=n())
We can use summary
on factor column:
summary(myDF$factorColumn)
One more approach would be to apply n() function which is counting the number of observations
library(dplyr)
library(magrittr)
data %>%
group_by(columnName) %>%
summarise(Count = n())
In case I just want to know how many unique factor levels exist in the data, I use:
length(unique(df$factorcolumn))
Use the package plyr with lapply to get frequencies for every value (level) and every variable (factor) in your data frame.
library(plyr)
lapply(df, count)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With