Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove text in-between delimiters in a string (using a regex?)

Consider the requirement to find a matched pair of set of characters, and remove any characters between them, as well as those characters/delimiters.

Here are the sets of delimiters:

 []    square brackets  ()    parentheses  ""    double quotes  ''    single quotes 

Here are some examples of strings that should match:

 Given:                       Results In: -------------------------------------------  Hello "some" World           Hello World  Give [Me Some] Purple        Give Purple  Have Fifteen (Lunch Today)   Have Fifteen  Have 'a good'day             Have day 

And some examples of strings that should not match:

 Does Not Match: ------------------  Hello "world  Brown]co[w  Cheese'factory 

If the given string doesn't contain a matching set of delimiters, it isn't modified. The input string may have many matching pairs of delimiters. If a set of 2 delimiters are overlapping (i.e. he[llo "worl]d"), that'd be an edge case that we can ignore here.

The algorithm would look something like this:

string myInput = "Give [Me Some] Purple (And More) Elephants"; string pattern; //some pattern string output = Regex.Replace(myInput, pattern, string.Empty); 

Question: How would you achieve this with C#? I am leaning towards a regex.

Bonus: Are there easy ways of matching those start and end delimiters in constants or in a list of some kind? The solution I am looking for would be easy to change the delimiters in case the business analysts come up with new sets of delimiters.

like image 760
p.campbell Avatar asked Aug 31 '09 21:08

p.campbell


People also ask

How do I remove a specific character from a string in regex?

If you are having a string with special characters and want's to remove/replace them then you can use regex for that. Use this code: Regex. Replace(your String, @"[^0-9a-zA-Z]+", "")

What are delimiters in regex?

Delimiters. The first element of a regular expression is the delimiters. These are the boundaries of your regular expressions. The most common delimiter that you'll see with regular expressions is the slash ( / ) or forward slash.

How do you trim a word in regex?

Trimming Whitespace You can easily trim unnecessary whitespace from the start and the end of a string or the lines in a text file by doing a regex search-and-replace. Search for ^[ \t]+ and replace with nothing to delete leading whitespace (spaces and tabs). Search for [ \t]+$ to trim trailing whitespace.


1 Answers

Simple regex would be:

string input = "Give [Me Some] Purple (And More) Elephants"; string regex = "(\\[.*\\])|(\".*\")|('.*')|(\\(.*\\))"; string output = Regex.Replace(input, regex, ""); 

As for doing it a custom way where you want to build up the regex you would just need to build up the parts:

('.*')  // example of the single quote check 

Then have each individual regex part concatenated with an OR (the | in regex) as in my original example. Once you have your regex string built just run it once. The key is to get the regex into a single check because performing a many regex matches on one item and then iterating through a lot of items will probably see a significant decrease in performance.

In my first example that would take the place of the following line:

string input = "Give [Me Some] Purple (And More) Elephants"; string regex = "Your built up regex here"; string sOutput = Regex.Replace(input, regex, ""); 

I am sure someone will post a cool linq expression to build the regex based on an array of delimiter objects to match or something.

like image 130
Kelsey Avatar answered Sep 22 '22 14:09

Kelsey