How to replace string with number based on position in a data frame?

Tags:

I have a vector of strings, in the following format:

strings <- c("UUDBK", "KUVEB", "YVCYE")

I also have a data frame like this:

replacewith <- c(8, 4, 2)
searchhere <- c("UUDBK, YVCYE, KUYVE, IHVYV, IYVEK", "KUVEB, UGEVB", "KUEBN, IHBEJ, KHUDN")
dataframe <- data.frame(replacewith, searchhere)

I want the strings vector to be replaced with the value in its corresponding "replacewith" column in this data frame. Currently the way I am doing it is:

final <- sapply(as.character(strings), function(x)
as.numeric(dataframe[grep(x, dataframe$searchhere), 1]))

However, this is very computationally heavy to be doing this with a character vector with length 10^9.

What is a better way to do this?

Thanks!

453

asked Nov 09 '17 21:11

Keshav M

1 Answers

Similar to @AntoniosK's idea, this instead uses hashmap to map the strings to their values. hashmap is implemented with Rcpp internally, so it is very fast:

library(hashmap)
library(tidyr)

search_replace = separate_rows(dataframe, searchhere)

search_hash = hashmap(search_replace[,2], search_replace[,1])

search_hash[[strings]]

Results:

> search_hash
## (character) => (numeric)  
##     [KHUDN] => [+2.000000]
##     [KUEBN] => [+2.000000]
##     [UGEVB] => [+4.000000]
##     [KUVEB] => [+4.000000]
##     [IYVEK] => [+8.000000]
##     [IHVYV] => [+8.000000]
##       [...] => [...] 

> search_hash[[strings]]
[1] 8 4 8

Benchmarks:

> OP_func = function(){sapply(as.character(strings), function(x)
    as.numeric(dataframe[grep(x,dataframe$searchhere), 1]))}

Unit: microseconds
                           expr     min       lq      mean   median      uq      max neval
                      OP_func() 121.191 124.9410 190.36472 129.8760 151.193 3370.047   100
 d[d$searchhere %in% strings, ]  36.714  40.6605  52.85093  43.8185  61.583  147.246   100
         search_hash[[strings]]  14.212  18.1590  25.05212  21.5150  29.608   58.820   100

Also note that @AntoniosK's solution does not work if there are duplicates in strings, while hashmap will return the correct mapping for each element in the correct position.

Example:

> strings_large = sample(search_replace$searchhere, 100, replace = TRUE)
> strings_large
  [1] "YVCYE" "KUVEB" "KUYVE" "KHUDN" "KUYVE" "KHUDN" "KUEBN" "UUDBK" "KHUDN" "YVCYE" "IYVEK"
 [12] "KUEBN" "KHUDN" "IHBEJ" "YVCYE" "KHUDN" "KUEBN" "UGEVB" "UUDBK" "KUYVE" "KHUDN" "IHBEJ"
 [23] "IHVYV" "KUVEB" "IYVEK" "KHUDN" "KHUDN" "KUYVE" "YVCYE" "UUDBK" "KUYVE" "IHVYV" "KUYVE"
 [34] "KUEBN" "KUYVE" "UUDBK" "KUYVE" "KUVEB" "KUVEB" "YVCYE" "KUYVE" "KHUDN" "KUVEB" "YVCYE"
 [45] "IHBEJ" "YVCYE" "KHUDN" "UUDBK" "KUEBN" "IYVEK" "IHVYV" "UUDBK" "KUYVE" "KUEBN" "YVCYE"
 [56] "UGEVB" "YVCYE" "KUYVE" "IHVYV" "KUEBN" "IHVYV" "IHBEJ" "KUVEB" "IHVYV" "KUYVE" "KUEBN"
 [67] "IYVEK" "KUVEB" "KUEBN" "UGEVB" "KUEBN" "KUVEB" "IHBEJ" "KUYVE" "YVCYE" "YVCYE" "IHVYV"
 [78] "YVCYE" "KHUDN" "KHUDN" "YVCYE" "IYVEK" "KUYVE" "KHUDN" "UGEVB" "YVCYE" "IHVYV" "KUVEB"
 [89] "IYVEK" "KUEBN" "UGEVB" "UUDBK" "IYVEK" "IHBEJ" "IHBEJ" "UUDBK" "KUVEB" "UGEVB" "IYVEK"
[100] "IYVEK"

> search_hash[[strings_large]]
  [1] 8 4 8 2 8 2 2 8 2 8 8 2 2 2 8 2 2 4 8 8 2 2 8 4 8 2 2 8 8 8 8 8 8 2 8 8 8 4 4 8 8 2 4 8
 [45] 2 8 2 8 2 8 8 8 8 2 8 4 8 8 8 2 8 2 4 8 8 2 8 4 2 4 2 4 2 8 8 8 8 8 2 2 8 8 8 2 4 8 8 4
 [89] 8 2 4 8 8 2 2 8 4 4 8 8

102

answered Sep 19 '22 12:09

acylam

Related questions
                            
                                Trigger sendgrid template email using meteor
                            
                                Graph API: PUT /beta/groups/<groupId>/team Authentication Error
                            
                                Rails 5.1 app with Vue and webpacker 3, css not compiled
                            
                                Difference in object boxing / comparing references between C# and VB.Net
                            
                                Loading R package in vignette for the package itself
                            
                                'IPromise<any>' is not assignable to type 'Promise<any>'
                            
                                OnActionExecuted method cant be overriden in .NetCore 2.0
                            
                                Node/Express angular 5 routing
                            
                                WebPush gmp extension is not loaded but is required for sending push notifications with payload . You can fix this in your php.ini
                            
                                Disadvantages mutating state directly and forceUpdate() vs setState
                            
                                Cryptography: Why does my encryption initialization vector only effect the first 16 bytes?
                            
                                Python: Grouping by date and finding the average of a column inside a dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With