set.seed(1)
dat <- data.frame(ID = sample(letters,50,rep=TRUE))
dat %>%
group_by(ID) %>%
summarise(no_rows = length(ID))
I have the above code which creates a random sample of letters. However can I make the summarised output show all count levels even when there is zero.
When I run the above code sometimes I get 20 rows and sometimes I get 25, etc. I want this to return 26 rows every time.
count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n()) .
Method 1 : Using summary() method The summary() function produces an output of the frequencies of the values per level of the given factor column of the data frame in R. A summary statistics for each of the variables of this column is result in a tabular format, as an output.
Get the Number of Levels of a Factor in R Programming – nlevels() Function. nlevels() function in R Language is used to get the number of levels of a factor.
count() function in dplyr can be used to count observations by multiple groups.
In the accepted answer by akrun, table()
works, but the tidyverse
answer gives inaccurate counts (see below). Instead use the .drop = FALSE
option:
library(tidyverse)
set.seed(1)
dat <- data.frame(ID = sample(letters,50,rep=TRUE))
dat %>%
mutate(ID = factor(ID, levels = letters)) %>%
count(ID, name = "no_rows", .drop = F) %>%
print.data.frame()
#> ID no_rows
#> 1 a 3
#> 2 b 2
#> 3 c 1
#> 4 d 1
#> 5 e 3
#> 6 f 3
#> 7 g 2
#> 8 h 1
#> 9 i 2
#> 10 j 5
#> 11 k 1
#> 12 l 3
#> 13 m 0
#> 14 n 3
#> 15 o 3
#> 16 p 0
#> 17 q 0
#> 18 r 1
#> 19 s 1
#> 20 t 3
#> 21 u 3
#> 22 v 1
#> 23 w 2
#> 24 x 0
#> 25 y 5
#> 26 z 1
Created on 2019-11-22 by the reprex package (v0.3.0)
Note that we expect nonzero counts for all letters but m, p, q, and x:
set.seed(1)
dat <- data.frame(ID = sample(letters,50,rep=TRUE))
levels(dat$ID)
#> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "n" "o" "r" "s" "t"
#> [18] "u" "v" "w" "y" "z"
But if we use complete()
we get ones instead:
set.seed(1)
dat <- data.frame(ID = sample(letters,50,rep=TRUE))
dat %>%
mutate(ID=factor(ID, levels = letters)) %>%
complete(ID) %>%
group_by(ID) %>%
summarise(no_rows = n()) %>%
print.data.frame()
#> ID no_rows
# ...
#> 12 l 3
#> 13 m 1 # should be 0
#> 14 n 3
#> 15 o 3
#> 16 p 1 # should be 0
#> 17 q 1 # should be 0
#> 18 r 1
#> 19 s 1
#> 20 t 3
#> 21 u 3
#> 22 v 1
#> 23 w 2
#> 24 x 1 # should be 0
#> 25 y 5
#> 26 z 1
That's because complete()
actually adds a single m, p, q, and x to ID
so it contains at least one of each letter.
We can convert 'ID' to factor
with levels
specified and just use table
table(factor(dat$ID, levels = letters))
Or using the same with tidyverse
library(tidyverse)
dat %>%
mutate(ID=factor(ID, levels = letters)) %>%
complete(ID) %>%
group_by(ID) %>%
summarise(no_rows = n())
This will also work:
set.seed(1)
dat <- data.frame(ID = sample(letters,50,rep=TRUE))
levels(dat$ID) <- letters
as.data.frame(dat %>% xtabs(formula = ~ ID))
# ID Freq
#1 a 2
#2 b 1
#3 c 1
#4 d 1
#5 e 2
#6 f 3
#7 g 2
#8 h 1
#9 i 4
#10 j 2
#11 k 4
#12 l 1
#13 m 2
#14 n 1
#15 o 3
#16 p 3
#17 q 3
#18 r 1
#19 s 5
#20 t 2
#21 u 1
#22 v 2
#23 w 2
#24 x 1
#25 y 0
#26 z 0
Using the accepted solution by akrun I get a wrong result. I expect a frequency table with all letters with no_rows = 0 except "a" with no_rows = 1.
library(tidyverse)
set.seed(1)
dat <- data.frame(ID = "a")
dat %>%
dplyr::mutate(ID=factor(ID, levels = letters)) %>%
tidyr::complete(ID) %>%
dplyr::group_by(ID) %>%
dplyr::summarise(no_rows = n())
Instead I get a frequency table with no_rows = 1 for all letters:
ID no_rows
<fct> <int>
1 a 1
2 b 1
3 c 1
4 d 1
5 e 1
6 f 1
7 g 1
8 h 1
9 i 1
10 j 1
# ... with 16 more rows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With