Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

protect specific words, delete letters from string

Tags:

string

regex

r

I would like to delete letters from a string, but protect specific words. Here is an example:

my.string <- "Water the 12 gold marigolds please, but not the 45 trees!"

desired.result <- "12 marigolds, 45 trees"

I tried the code below, which gave a surprising result. I thought () would protect whatever it contained. Instead, just the opposite happened. Only the words within () were deleted (plus the !).

gsub("(marigolds|trees)\\D", "", my.string)

# [1] "Water the 12 gold please, but not the 45 "

Here is an example with a longer string:

my.string <- "Water the 12 gold marigolds please, but not the 45 trees!, The 7 orange marigolds are fine."

desired.result <- "12 marigolds, 45 trees, 7 marigolds"

gsub("(marigolds|trees)\\D", "", my.string)

Returns:

[1] "Water the 12 gold please, but not the 45 , The 7 orange are fine."

Thank you for any advice. I prefer a regex solution in base R.

like image 324
Mark Miller Avatar asked Feb 22 '14 09:02

Mark Miller


People also ask

How do I remove a specific letter from a string?

You can also remove a specified character or substring from a string by calling the String. Replace(String, String) method and specifying an empty string (String. Empty) as the replacement. The following example removes all commas from a string.

How do I remove all characters from a string before a specific character?

To remove everything before the first occurrence of the character '-' in a string, pass the character '-' as a separator in the partition() function. Then assign the part after the separator to the original string variable. It will give an effect that we have deleted everything before the character '-' in a string.


2 Answers

Using word boundary, negative look-ahead assertion.

> my.string <- "Water the 12 gold marigolds please, but not the 45 trees!"
> gsub("\\b(?!marigolds\\b|trees\\b)[A-Za-z]+\\s*", "", my.string, perl=TRUE)
[1] "12 marigolds , 45 trees!"
> gsub("\\b(?!marigolds\\b|trees\\b)[A-Za-z]+\\s*|!", "", my.string, perl=TRUE)
[1] "12 marigolds , 45 trees"
like image 170
falsetru Avatar answered Oct 29 '22 12:10

falsetru


An other way with a capturing group:

my.string <- "Water the 12 gold marigolds please, but not the 45 trees!, The 7 orange marigolds are fine."
gsub("(?i)\\b(?:(marigolds|trees)|[a-z]+)\\b\\s*|[.?!]", "\\1", my.string, perl=TRUE)
like image 20
Casimir et Hippolyte Avatar answered Oct 29 '22 13:10

Casimir et Hippolyte