Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R regex gsub separate letters and numbers

I have a string that's mixed letters and numbers:

"The sample is 22mg"

I'd like to split strings where a number is immediately followed by letter like this:

"The sample is 22 mg"

I've tried this:

gsub('[0-9]+[[aA-zZ]]', '[0-9]+ [[aA-zZ]]', 'This is a test 22mg')

but am not getting the desired results.

Any suggestions?

like image 588
screechOwl Avatar asked Jul 23 '12 01:07

screechOwl


People also ask

What does gsub () do in R?

The gsub() function in R is used to replace the strings with input strings or values. Note that, you can also use the regular expression with gsub() function to deal with numbers.

How do I remove numbers from letters in R?

To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.

How do I get rid of non numeric characters in R?

We will remove non-alphanumeric characters by using str_replace_all() method. [^[:alnum:]] is the parameter that removes the non-alphanumeric characters.


2 Answers

You need to use capturing parentheses in the regular expression and group references in the replacement. For example:

gsub('([0-9])([[:alpha:]])', '\\1 \\2', 'This is a test 22mg')

There's nothing R-specific here; the R help for regex and gsub should be of some use.

like image 72
Nicholas Riley Avatar answered Nov 02 '22 06:11

Nicholas Riley


You need backreferencing:

test <- "The sample is 22mg"
> gsub("([0-9])([a-zA-Z])","\\1 \\2",test)
[1] "The sample is 22 mg"

Anything in parentheses gets remembered. Then they're accessed by \1 (for the first entity in parens), \2, etc. The first backslash escapes the backslash's interpretation in R so that it gets passed to the regular expression parser.

like image 33
Ari B. Friedman Avatar answered Nov 02 '22 05:11

Ari B. Friedman