I use the following command:
table(factor("list",levels=1:"n")
with "list": (example) a = c(1,3,4,4,3)
and levels = 1:5
, to also take the 2 and 5 into consideration.
For really big datasets, my code seems to be very ineffective.
Does anyone know a hidden library or a code snippet to make it faster?
Here is one more: summarytools
Data from Martin Gal! Many thanks:
library(summarytools)
set.seed(8192)
df <- data.frame(X1 = sample(1:10, 100, replace = TRUE))
summarytools::freq(df$X1, cumul=FALSE)
Output:
Freq % Valid % Total
----------- ------ --------- ---------
1 9 9.00 9.00
2 6 6.00 6.00
3 15 15.00 15.00
4 13 13.00 13.00
5 11 11.00 11.00
6 9 9.00 9.00
7 7 7.00 7.00
8 9 9.00 9.00
9 11 11.00 11.00
10 10 10.00 10.00
<NA> 0 0.00
Total 100 100.00 100.00
We could use fnobs
from collapse
which would be efficient
library(collapse)
fnobs(df, g = df$X1)
In base R
, tabulate
is more efficient compared to table
tabulate(df$X1)
[1] 9 6 15 13 11 9 7 9 11 10
We could also use janitor::tabyl
:
library(janitor)
df %>%
tabyl(X1) %>%
adorn_totals()
X1 n percent
1 9 0.09
2 6 0.06
3 15 0.15
4 13 0.13
5 11 0.11
6 9 0.09
7 7 0.07
8 9 0.09
9 11 0.11
10 10 0.10
Total 100 1.00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With