Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: removing numbers at begin and end of a string

I've got the following vector:

words <- c("5lang","kasverschil2","b2b")

I want to remove "5" in "5lang" and "2" in "kasverschil2". But I do NOT want to remove "2" in "b2b".

like image 561
Anita Avatar asked Oct 14 '14 11:10

Anita


People also ask

How do I remove a number from a string in R?

To remove dot and number at the end of the string, we can use gsub function. It will search for the pattern of dot and number at the end of the string in the vector then removal of the pattern can be done by using double quotes without space.

How do I remove numbers and special characters from a string in R?

Answer : Use [^[:alnum:]] to remove ~! @#$%^&*(){}_+:"<>?,./;'[]-= and use [^a-zA-Z0-9] to remove also â í ü Â á ą ę ś ć in regex or regexpr functions.

How do I remove the first 3 characters from a string in R?

To remove the string's first character, we can use the built-in substring() function in R. The substring() function accepts 3 arguments, the first one is a string, the second is start position, third is end position.


1 Answers

 gsub("^\\d+|\\d+$", "", words)    
 #[1] "lang"        "kasverschil" "b2b"

Another option would be to use stringi

 library(stringi)
 stri_replace_all_regex(words, "^\\d+|\\d+$", "")
  #[1] "lang"        "kasverschil" "b2b"        

Using a variant of the data set provided by the OP here are benchmarks for 3 three main solutions (note that these strings are very short and contrived; results may differ on a larger, real data set):

words <- rep(c("5lang","kasverschil2","b2b"), 100000)

library(stringi)
library(microbenchmark)

GSUB <- function() gsub("^\\d+|\\d+$", "", words)
STRINGI <- function() stri_replace_all_regex(words, "^\\d+|\\d+$", "")
GREGEXPR <- function() {
    gregexpr(pattern='(^[0-9]+|[0-9]+$)', text = words) -> mm
    sapply(regmatches(words, mm, invert=TRUE), paste, collapse="") 
}

microbenchmark( 
    GSUB(),
    STRINGI(),
    GREGEXPR(),
    times=100L
)

## Unit: milliseconds
##        expr       min        lq    median        uq       max neval
##      GSUB()  301.0988  349.9952  396.3647  431.6493  632.7568   100
##   STRINGI()  465.9099  513.1570  569.1972  629.4176  738.4414   100
##  GREGEXPR() 5073.1960 5706.8160 6194.1070 6742.1552 7647.8904   100
like image 82
akrun Avatar answered Oct 20 '22 08:10

akrun