I have a string that's mixed letters and numbers:
"The sample is 22mg"
I'd like to split strings where a number is immediately followed by letter like this:
"The sample is 22 mg"
I've tried this:
gsub('[0-9]+[[aA-zZ]]', '[0-9]+ [[aA-zZ]]', 'This is a test 22mg')
but am not getting the desired results.
Any suggestions?
The gsub() function in R is used to replace the strings with input strings or values. Note that, you can also use the regular expression with gsub() function to deal with numbers.
To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.
We will remove non-alphanumeric characters by using str_replace_all() method. [^[:alnum:]] is the parameter that removes the non-alphanumeric characters.
You need to use capturing parentheses in the regular expression and group references in the replacement. For example:
gsub('([0-9])([[:alpha:]])', '\\1 \\2', 'This is a test 22mg')
There's nothing R-specific here; the R help for regex
and gsub
should be of some use.
You need backreferencing:
test <- "The sample is 22mg"
> gsub("([0-9])([a-zA-Z])","\\1 \\2",test)
[1] "The sample is 22 mg"
Anything in parentheses gets remembered. Then they're accessed by \1 (for the first entity in parens), \2, etc. The first backslash escapes the backslash's interpretation in R so that it gets passed to the regular expression parser.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With