Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing text between 2 strings

Tags:

string

c#

I tried to write a function in C# which removes the string between two strings. Like this:

string RemoveBetween(string sourceString, string startTag, string endTag)

At first I thought this is easy, but after some time I encountered more and more problems

So this is the easy case (All examples with startTag="Start" and endTag="End")

"Any Text Start remove this End between" => "Any Text StartEnd between"

But it should also be able to handle multiples without deleting the text between:

"Any Text Start remove this End between should be still there Start and remove this End multiple" => "Any Text StartEnd between should be still there StartEnd multiple"

It should always take the smallest string to remove:

"So Start followed by Start only remove this End other stuff" => "So Start followed by StartEnd other stuff"

It should also respect the order of the the Tags:

"the End before Start. Start before End is correct" => "the End before Start. StartEnd is correct"

I tried a RegEx which did not work (It could not handle multiples):

public string RemoveBetween(string sourceString, string startTag, string endTag)
{
    Regex regex = new Regex(string.Format("{0}(.*){1}", Regex.Escape(startTag), Regex.Escape(endTag)));
    return regex.Replace(sourceString, string.Empty);
}

And than I tried to work with IndexOf and Substring, but I do not see an end. And even if it would work, this cant be the most elegant way to solve this.

like image 504
Gener4tor Avatar asked Dec 17 '22 21:12

Gener4tor


2 Answers

Here is a approach with string.Remove()

string input = "So Start followed by Start only remove this End other stuff";
int start = input.LastIndexOf("Start") + "Start".Length;
int end = input.IndexOf("End", start);
string result = input.Remove(start, end - start);

I use LastIndexOf() because there can be multiple starts and you want to have the last one.

like image 191
fubo Avatar answered Jan 04 '23 15:01

fubo


You must sligthly modify your function to do a non-greedy match with ? and RegexOptions.RightToLeft to work with all your examples :

    public static string RemoveBetween(string sourceString, string startTag, string endTag)
    {
        Regex regex = new Regex(string.Format("{0}(.*?){1}", Regex.Escape(startTag), Regex.Escape(endTag)), RegexOptions.RightToLeft);
        return regex.Replace(sourceString, startTag+endTag);
    }
like image 34
XouDo Avatar answered Jan 04 '23 15:01

XouDo