Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete a segment of a string with a specific start and end in R using regular expressions?

I have a string.

str = c("F14 : M114L","W15 : M116L, W15 : M118L","W15 : D111L, F14 : E112L, F14 : M116L")

The goal is to delete anything between : and L (also includes the white space right in front of :) such that I would end up having

"F14", "W15, W15", "W15, F14, F14"

I am thinking using

gsub(" : [[:alnum:]]L", "", str)

But clearly it does not work. Don't know if there is something like a wildcard that can represent any number of digits and characters between : and L.

like image 569
wen Avatar asked Jul 10 '15 03:07

wen


People also ask

How do I remove a specific part of a string in R?

Remove Specific Character from StringUse gsub() function to remove a character from a string or text in R. This is an R base function that takes 3 arguments, first, the character to look for, second, the value to replace with, in our case we use blank string, and the third input string were to replace.

How do I remove part of a value in R?

To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.

How do I remove the numbers at the end of a string in R?

To remove dot and number at the end of the string, we can use gsub function. It will search for the pattern of dot and number at the end of the string in the vector then removal of the pattern can be done by using double quotes without space.

How do I remove the first part of a string in R?

To remove the string's first character, we can use the built-in substring() function in R. The substring() function accepts 3 arguments, the first one is a string, the second is start position, third is end position.


2 Answers

This will do it:

gsub(" : .*?L", "", str)
#[1] "F14"           "W15, W15"      "W15, F14, F14"
like image 162
Jota Avatar answered Oct 30 '22 18:10

Jota


You can do this with ease using the qdapRegex package that I maintain:

str = c("F14 : M114L","W15 : M116L, W15 : M118L","W15 : D111L, F14 : E112L, F14 : M116L")

library(qdapRegex)
rm_between(str, "\\s:", "L")
## [1] "F14"           "W15, W15"      "W15, F14, F14"

qdapRegex aims to be useful as it teaches. If you are interested in the regex used...

S("@rm_between", "\\s:", "L")
## [1] "(\\s:)(.*?)(L)"

gsub(S("@rm_between", "\\s:", "L") , "", str)
like image 43
Tyler Rinker Avatar answered Oct 30 '22 19:10

Tyler Rinker