Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Increase performance by moving away from a for loop

Tags:

r

data.table

The gist of the argument is the following:

A function that I wrote, takes into consideration one argument, an alphanumeric string, and should output a string where the values of each element of this alphanumeric string are switched for some 'mapping'. MRE as follows:

#This is the original and switches value map
map = data.table(mapped = c(0:35), original = c(0:9,LETTERS))
#the function that I'm using:
as_numbers <- function(string) {
  #split string unlisted
  vector_unlisted <- unlist(strsplit(string,""))
  #match the string in vector
  for (i in 1:length(vector_unlisted)) {

    vector_unlisted[i] <- subset(map, map$original==vector_unlisted[i])[[1]][1]

  }
  vector_unlisted <- paste0(vector_unlisted, collapse = "")

  return(vector_unlisted)
}

I am trying to move away from the for loop for something that increases performance, as the function works, but it is pretty slow for the amount of elements I have supplied in this form:

unlist(lapply(dat$alphanum, function(x) as_numbers(x)))

An example of the input strings would be:549300JV8KEETQJYUG13. This should result in a string like 5493001931820141429261934301613

Supplying just one string in this case:

> as_numbers("549300JV8KEETQJYUG13")
[1] "5493001931820141429261934301613"
like image 514
erasmortg Avatar asked Feb 23 '16 14:02

erasmortg


3 Answers

Using Reduce and gsub, you could define the following function

replacer <- function(x) Reduce(function(x,r) gsub(map$original[r],
             map$mapped[r], x, fixed=T), seq_len(nrow(map)),x)


# Let's test it
replacer("549300JV8KEETQJYUG13")
#[1] "5493001931820141429261934301613"
like image 21
mtoto Avatar answered Nov 16 '22 02:11

mtoto


We can use base conversion:

#input and expected output
x <- "549300JV8KEETQJYUG13"
# "5493001931820141429261934301613"

#output
res <- paste0(strtoi(unlist(strsplit(x, "")), base = 36), collapse = "")

#test output
as_numbers(x) == res
# [1] TRUE

Performance

Since this post is about performance, here is benchmarking* for 3 solutions:

#input set up
map = data.table(mapped = c(0:35), original = c(0:9,LETTERS))
x <- rep(c("549300JV8KEETQJYUG13", "5493V8KE300J"), 1000)

#define functions
base_f <- function(string) {
  sapply(string, function(x) {
    paste0(strtoi(unlist(strsplit(x, "")), base = 36), collapse = "")
    })
  }

match_f <- function(string) {
  mapped <- map$mapped
  original <- map$original
  sapply(strsplit(string, ""), function(y) {
    paste0(mapped[match(y, original)], collapse= "")})
  }

reduce_f <- function(string) {
  Reduce(function(string,r) 
    gsub(map$original[r],
         map$mapped[r], string, fixed = TRUE),
    seq_len(nrow(map)), string)
  }

#test if all return same output
all(base_f(x) == match_f(x))
# [1] TRUE
all(base_f(x) == reduce_f(x))
# [1] TRUE

library(rbenchmark)
benchmark(replications = 1000,
          base_f(x),
          match_f(x),
          reduce_f(x))
#          test replications elapsed relative user.self sys.self user.child sys.child
# 1   base_f(x)         1000   22.15    4.683     22.12        0         NA        NA
# 2  match_f(x)         1000   19.18    4.055     19.11        0         NA        NA
# 3 reduce_f(x)         1000    4.73    1.000      4.72        0         NA        NA

*Note: microbenchmark() keeps throwing warnings, hence used rbenchmark() instead. Feel free to test with other libraries and update this post.

like image 76
zx8754 Avatar answered Nov 16 '22 01:11

zx8754


Seems like a merge:

map[as.data.table(unlist(strsplit(string, ""))),
    .(mapped), on = c(original = "V1")][ , paste0(mapped, collapse = "")]

Note that both "D1" and "1V" will be mapped to "131"...

On your example output is: "5493001931820141429261934301613"

You can use sep = "." if you actually want this to be a reversible mapping...

like image 20
MichaelChirico Avatar answered Nov 16 '22 00:11

MichaelChirico