Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count how many values per level in a given factor?

Tags:

r

count

frequency

I have a data.frame mydf with about 2500 rows. These rows correspond to 69 classes of objects in colum 1 mydf$V1, and I want to count how many rows per object class I have. I can get a factor of these classes with:

objectclasses = unique(factor(mydf$V1, exclude="1"));

What's the terse R way to count the rows per object class? If this were any other language I'd be traversing an array with a loop and keeping count but I'm new to R programming and am trying to take advantage of R's vectorised operations.

like image 573
Escher Avatar asked Sep 30 '14 06:09

Escher


People also ask

How do you count values per level in a factor in R?

Method 1 : Using summary() method The summary() function produces an output of the frequencies of the values per level of the given factor column of the data frame in R. A summary statistics for each of the variables of this column is result in a tabular format, as an output.

How many levels are in a factor?

A factor must have at least two levels. If a factor only had one level then the effect of the factor could not be assessed.

Which function gives the count of levels in a factor?

nlevels() function in R Language is used to get the number of levels of a factor.

How do I count the number of observations in a group in R?

count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n()) .


8 Answers

Or using the dplyr library:

library(dplyr)
set.seed(1)
dat <- data.frame(ID = sample(letters,100,rep=TRUE))
dat %>% 
  group_by(ID) %>%
  summarise(no_rows = length(ID))

Note the use of %>%, which is similar to the use of pipes in bash. Effectively, the code above pipes dat into group_by, and the result of that operation is piped into summarise.

The result is:

Source: local data frame [26 x 2]

   ID no_rows
1   a       2
2   b       3
3   c       3
4   d       3
5   e       2
6   f       4
7   g       6
8   h       1
9   i       6
10  j       5
11  k       6
12  l       4
13  m       7
14  n       2
15  o       2
16  p       2
17  q       5
18  r       4
19  s       5
20  t       3
21  u       8
22  v       4
23  w       5
24  x       4
25  y       3
26  z       1

See the dplyr introduction for some more context, and the documentation for details regarding the individual functions.

like image 93
Paul Hiemstra Avatar answered Oct 04 '22 07:10

Paul Hiemstra


Here 2 ways to do it:

set.seed(1)
tt <- sample(letters,100,rep=TRUE)

## using table
table(tt)
tt
a b c d e f g h i j k l m n o p q r s t u v w x y z 
2 3 3 3 2 4 6 1 6 5 6 4 7 2 2 2 5 4 5 3 8 4 5 4 3 1 
## using tapply
tapply(tt,tt,length)
a b c d e f g h i j k l m n o p q r s t u v w x y z 
2 3 3 3 2 4 6 1 6 5 6 4 7 2 2 2 5 4 5 3 8 4 5 4 3 1 
like image 42
agstudy Avatar answered Oct 04 '22 05:10

agstudy


Using plyr package:

library(plyr)

count(mydf$V1)

It will return you a frequency of each value.

like image 26
Andriy T. Avatar answered Oct 04 '22 06:10

Andriy T.


Using data.table

 library(data.table)
 setDT(dat)[, .N, keyby=ID] #(Using @Paul Hiemstra's `dat`)

Or using dplyr 0.3

 res <- count(dat, ID)
 head(res)
 #Source: local data frame [6 x 2]

 #  ID n
 #1  a 2
 #2  b 3
 #3  c 3
 #4  d 3
 #5  e 2
 #6  f 4

Or

  dat %>% 
      group_by(ID) %>% 
      tally()

Or

  dat %>% 
      group_by(ID) %>%
      summarise(n=n())
like image 33
akrun Avatar answered Oct 04 '22 05:10

akrun


We can use summary on factor column:

summary(myDF$factorColumn)
like image 32
Spariant Avatar answered Oct 04 '22 06:10

Spariant


One more approach would be to apply n() function which is counting the number of observations

library(dplyr)
library(magrittr)
data %>% 
  group_by(columnName) %>%
  summarise(Count = n())
like image 38
iamigham Avatar answered Oct 04 '22 06:10

iamigham


In case I just want to know how many unique factor levels exist in the data, I use:

length(unique(df$factorcolumn))
like image 44
Peter Avatar answered Oct 04 '22 06:10

Peter


Use the package plyr with lapply to get frequencies for every value (level) and every variable (factor) in your data frame.

library(plyr)
lapply(df, count)
like image 45
Christian Savemark Avatar answered Oct 04 '22 06:10

Christian Savemark