Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to add an element to a vector if it doesn't already exist in R

Tags:

r

unique

vector

I am looking for the fastest way in R to add an element (of type character) to a vector if it doesn't already exist. Right now I have simply

vect=c("a","b","c")
vect=unique(c(vect,"b"))
vect=unique(c(vect,"d"))

etc

but I presume there must be better ways of doing this. Any thoughts? (my vector has about 2 million strings (web URLs) )

cheers, Tom

like image 439
Tom Wenseleers Avatar asked Oct 18 '25 13:10

Tom Wenseleers


2 Answers

The %chin% operator from data.table is specially written to be fast for character vectors. Here is an example:

#  Your data, and we would like to add elements from add
#  that are not already in vect
vect <- c("a","b","c")
add <- c( "a" , "d" , "e" , "b" )

#  Load package
require( data.table )

# %chin% operator is smae as %in% but fast and optimised for character sequences
c( vect , add[ ! add %chin% vect ] )
[1] "a" "b" "c" "d" "e"
like image 52
Simon O'Hanlon Avatar answered Oct 21 '25 04:10

Simon O'Hanlon


Apparently, you want the union of two vectors:

vect <- c("a","b","c")
add <- c( "a" , "d" , "e" , "b" )

union(vect, add)
#[1] "a" "b" "c" "d" "e"

Which, as Simon points out, is the same as your solution.

Here are some benchmarks:

library(data.table)
library(microbenchmark)
microbenchmark(union(vect, add),c( vect , add[ ! add %chin% vect ] ),times=10)
# Unit: microseconds
#                           expr    min     lq  median     uq    max neval
#               union(vect, add) 12.628 13.243 13.3980 15.092 65.599    10
# c(vect, add[!add %chin% vect])  2.773  3.080  3.3885  4.620 51.740    10


vect <- as.character(seq_len(1e6))
microbenchmark(union(vect, add),c( vect , add[ ! add %chin% vect ] ),times=10)
#Unit: milliseconds
#                          expr       min        lq    median        uq      max neval
#              union(vect, add) 176.34441 188.82082 261.09802 339.96974 493.7810    10
#c(vect, add[!add %chin% vect])  35.37661  37.14743  47.06862  70.46896 203.7034    10
like image 35
Roland Avatar answered Oct 21 '25 04:10

Roland



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!