Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an efficient alternative to table()?

I use the following command:

table(factor("list",levels=1:"n")

with "list": (example) a = c(1,3,4,4,3) and levels = 1:5, to also take the 2 and 5 into consideration. For really big datasets, my code seems to be very ineffective.

Does anyone know a hidden library or a code snippet to make it faster?

like image 366
elmoBlue Avatar asked Jul 31 '21 17:07

elmoBlue


3 Answers

Here is one more: summarytools

Data from Martin Gal! Many thanks:

library(summarytools)

set.seed(8192)
df <- data.frame(X1 = sample(1:10, 100, replace = TRUE))

summarytools::freq(df$X1, cumul=FALSE)

Output:

              Freq   % Valid   % Total
----------- ------ --------- ---------
          1      9      9.00      9.00
          2      6      6.00      6.00
          3     15     15.00     15.00
          4     13     13.00     13.00
          5     11     11.00     11.00
          6      9      9.00      9.00
          7      7      7.00      7.00
          8      9      9.00      9.00
          9     11     11.00     11.00
         10     10     10.00     10.00
       <NA>      0                0.00
      Total    100    100.00    100.00
like image 121
TarJae Avatar answered Oct 04 '22 21:10

TarJae


We could use fnobs from collapse which would be efficient

library(collapse)
fnobs(df, g = df$X1)

In base R, tabulate is more efficient compared to table

 tabulate(df$X1)
 [1]  9  6 15 13 11  9  7  9 11 10
like image 37
akrun Avatar answered Oct 04 '22 20:10

akrun


We could also use janitor::tabyl:

library(janitor)

df %>%
  tabyl(X1) %>%
  adorn_totals()

    X1   n percent
     1   9    0.09
     2   6    0.06
     3  15    0.15
     4  13    0.13
     5  11    0.11
     6   9    0.09
     7   7    0.07
     8   9    0.09
     9  11    0.11
    10  10    0.10
 Total 100    1.00
like image 29
Anoushiravan R Avatar answered Oct 04 '22 19:10

Anoushiravan R