Count number of occurrences of vector in list

Tags:

r

I have a list of vectors of variable length, for example:

q <- list(c(1,3,5), c(2,4), c(1,3,5), c(2,5), c(7), c(2,5))

I need to count the number of occurrences for each of the vectors in the list, for example (any other suitable datastructure acceptable):

list(list(c(1,3,5), 2), list(c(2,4), 1), list(c(2,5), 2), list(c(7), 1))

Is there an efficient way to do this? The actual list has tens of thousands of items so quadratic behaviour is not feasible.

747

asked Sep 07 '16 14:09

2 Answers

match and unique accept and handle "list"s too (?match warns for being slow on "list"s). So, with:

match(q, unique(q))
#[1] 1 2 1 3 4 3

each element is mapped to a single integer. Then:

tabulate(match(q, unique(q)))
#[1] 2 1 2 1

And find a structure to present the results:

as.data.frame(cbind(vec = unique(q), n = tabulate(match(q, unique(q)))))
#      vec n
#1 1, 3, 5 2
#2    2, 4 1
#3    2, 5 2
#4       7 1

Alternatively to match(x, unique(x)) approach, we could map each element to a single value with deparseing:

table(sapply(q, deparse))
#
#         7 c(1, 3, 5)    c(2, 4)    c(2, 5) 
#         1          2          1          2

Also, since this is a case with unique integers, and assuming in a small range, we could map each element to a single integer after transforming each element to a binary representation:

n = max(unlist(q))
pow2 = 2 ^ (0:(n - 1))
sapply(q, function(x) tabulate(x, nbins = n))  # 'binary' form
sapply(q, function(x) sum(tabulate(x, nbins = n) * pow2))
#[1] 21 10 21 18 64 18

and then tabulate as before.

And just to compare the above alternatives:

f1 = function(x) 
{
    ux = unique(x)
    i = match(x, ux)
    cbind(vec = ux, n = tabulate(i))
}   

f2 = function(x)
{
    xc = sapply(x, deparse)
    i = match(xc, unique(xc))
    cbind(vec = x[!duplicated(i)], n = tabulate(i))
}  

f3 = function(x)
{
    n = max(unlist(x))
    pow2 = 2 ^ (0:(n - 1))
    v = sapply(x, function(X) sum(tabulate(X, nbins = n) * pow2))
    i = match(v, unique(v))
    cbind(vec = x[!duplicated(v)], n = tabulate(i))
}

q2 = rep_len(q, 1e3)

all.equal(f1(q2), f2(q2))
#[1] TRUE
all.equal(f2(q2), f3(q2))
#[1] TRUE

microbenchmark::microbenchmark(f1(q2), f2(q2), f3(q2))
#Unit: milliseconds
#   expr       min        lq      mean    median        uq       max neval cld
# f1(q2)  7.980041  8.161524 10.525946  8.291678  8.848133 178.96333   100  b 
# f2(q2) 24.407143 24.964991 27.311056 25.514834 27.538643  45.25388   100   c
# f3(q2)  3.951567  4.127482  4.688778  4.261985  4.518463  10.25980   100 a

Another interesting alternative is based on ordering. R > 3.3.0 has a grouping function, built off data.table, which, along with the ordering, provides some attributes for further manipulation:

Make all elements of equal length and "transpose" (probably the most slow operation in this case, though I'm not sure how else to feed grouping):

n = max(lengths(q))
qq = .mapply(c, lapply(q, "[", seq_len(n)), NULL)

Use ordering to group similar elements mapped to integers:

gr = do.call(grouping, qq)
e = attr(gr, "ends") 
i = rep(seq_along(e), c(e[1], diff(e)))[order(gr)]
i
#[1] 1 2 1 3 4 3

then, tabulate as before. To continue the comparisons:

f4 = function(x)
{
    n = max(lengths(x))
    x2 = .mapply(c, lapply(x, "[", seq_len(n)), NULL)
    gr = do.call(grouping, x2)
    e = attr(gr, "ends") 
    i = rep(seq_along(e), c(e[1], diff(e)))[order(gr)]
    cbind(vec = x[!duplicated(i)], n = tabulate(i))
}

all.equal(f3(q2), f4(q2))
#[1] TRUE

microbenchmark::microbenchmark(f1(q2), f2(q2), f3(q2), f4(q2))
#Unit: milliseconds
#   expr       min        lq      mean    median        uq        max neval cld
# f1(q2)  7.956377  8.048250  8.792181  8.131771  8.270101  21.944331   100  b 
# f2(q2) 24.228966 24.618728 28.043548 25.031807 26.188219 195.456203   100   c
# f3(q2)  3.963746  4.103295  4.801138  4.179508  4.360991  35.105431   100 a  
# f4(q2)  2.874151  2.985512  3.219568  3.066248  3.186657   7.763236   100 a

In this comparison q's elements are of small length to accomodate for f3, but f3 (because of large exponentiation) and f4 (because of mapply) will suffer, in performance, if "list"s of larger elements are used.

answered Nov 04 '22 22:11

alexis_laz

One way is to paste each vector , unlist and tabulate, i.e.

table(unlist(lapply(q, paste, collapse = ',')))

#1,3,5   2,4   2,5     7 
#    2     1     2     1

answered Nov 04 '22 22:11

Sotos

Related questions
                            
                                Extract integers from ranges
                            
                                How to plot a violin scatter boxplot (in R)?
                            
                                R pch plotting in illustator as "q"
                            
                                Detect conflicts between packages in R [duplicate]
                            
                                How can I create a vector containing the days of the week?
                            
                                Conditional Output Shiny UI
                            
                                how to read huge csv file into R by row condition?
                            
                                Keeping only certain rows of a data frame based on a set of values
                            
                                R create ID within a group [duplicate]
                            
                                How to plot logit and probit in ggplot2
                            
                                Reverse Scoring Items
                            
                                Removing characters from string in R
                            
                                Scale relative to a value in each group (via dplyr)
                            
                                How to install stringi library from archive and install the local icu52l.zip
                            
                                Circular shift of vector by distance n [duplicate]
                            
                                Shiny App Error: /v1/applications/ 400 - Validation Error Execution halted
                            
                                Shiny App checkboxInput and conditionalPanel
                            
                                paste grid -- expand.grid for string concatenation
                            
                                Skip tests on CRAN, but run locally
                            
                                R: apply-like function that returns a data frame?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Count number of occurrences of vector in list

Tags:

r

Jan Šimbera

People also ask

2 Answers

alexis_laz

Sotos

Recent Activity

Donate For Us