Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove only certain substrings from a string?

Tags:

string

c#

regex

Using C#, I have a string that is a SQL script containing multiple queries. I want to remove sections of the string that are enclosed in single quotes. I can do this using Regex.Replace, in this manner:

string test = "Only 'together' can we turn him to the 'dark side' of the Force";
test = Regex.Replace(test, "'[^']*'", string.Empty);

Results in: "Only can we turn him to the of the Force"

What I want to do is remove the substrings between quotes EXCEPT for substrings containing a specific substring. For example, using the string above, I want to remove the quoted substrings except for those that contain "dark," such that the resulting string is:

Results in: "Only can we turn him to the 'dark side' of the Force"

How can this be accomplished using Regex.Replace, or perhaps by some other technique? I'm currently trying a solution that involves using Substring(), IndexOf(), and Contains().

Note: I don't care if the single quotes around "dark side" are removed or not, so the result could also be: "Only can we turn him to the dark side of the Force." I say this because a solution using Split() would remove all the single quotes.

Edit: I don't have a solution yet using Substring(), IndexOf(), etc. By "working on," I mean I'm thinking in my head how this can be done. I have no code, which is why I haven't posted any yet. Thanks.

Edit: VKS's solution below works. I wasn't escaping the \b the first attempt which is why it failed. Also, it didn't work unless I included the single quotes around the whole string as well.

test = Regex.Replace(test, "'(?![^']*\\bdark\\b)[^']*'", string.Empty);
like image 506
armus47 Avatar asked Jan 23 '15 08:01

armus47


People also ask

How do I remove a particular substring from a string?

The first and most commonly used method to remove/replace any substring is the replace() method of Java String class. The first parameter is the substring to be replaced, and the second parameter is the new substring to replace the first parameter.

How do I remove part of a string in R?

Use the substr() Function to Remove the Last Characters in R The substr() function in R extracts or replaces a substring from a string. We pass the given string and the starting and final position of the required substring to the function.

How do I remove a suffix from a string in Python?

There are multiple ways to remove whitespace and other characters from a string in Python. The most commonly known methods are strip() , lstrip() , and rstrip() . Since Python version 3.9, two highly anticipated methods were introduced to remove the prefix or suffix of a string: removeprefix() and removesuffix() .


5 Answers

'(?![^']*\bdark\b)[^']*'

Try this.See demo.Replace by empty string.You can use lookahead here to check if '' contains a word dark.

https://www.regex101.com/r/rG7gX4/12

like image 50
vks Avatar answered Oct 14 '22 17:10

vks


While vks's solution works, I'd like to demonstrate a different approach:

string test = "Only 'together' can we turn him to the 'dark side' of the Force";
test = Regex.Replace(test, @"'[^']*'", match => {
    if (match.Value.Contains("dark"))
        return match.Value;

    // You can add more cases here

    return string.Empty;
});

Or, if your condition is simple enough:

test = Regex.Replace(test, @"'[^']*'", match => match.Value.Contains("dark")
    ? match.Value
    : string.Empty
);

That is, use a lambda to provide a callback for the replacement. This way, you can run arbitrary logic to replace the string.

like image 27
Lucas Trzesniewski Avatar answered Oct 14 '22 18:10

Lucas Trzesniewski


some thing like this would work.
you can add all strings you want to keep into the excludedStrings array

        string test = "Only 'together' can we turn him to the 'dark side' of the Force";

        var excludedString = new string[] { "dark side" };

        int startIndex = 0;

        while ((startIndex = test.IndexOf('\'', startIndex)) >= 0)
        {
            var endIndex = test.IndexOf('\'', startIndex + 1);
            var subString = test.Substring(startIndex, (endIndex - startIndex) + 1);
            if (!excludedString.Contains(subString.Replace("'", "")))
            {
                test = test.Remove(startIndex, (endIndex - startIndex) + 1);
            }
            else
            {
                startIndex = endIndex + 1;
            }
        }
like image 38
Vignesh.N Avatar answered Oct 14 '22 16:10

Vignesh.N


I made this attempt that I think you were thinking about (some solution using split, Contain, ... without regex)

string test = "Only 'together' can we turn him to the 'dark side' of the Force";
string[] separated = test.Split('\'');

string result = "";

for (int i = 0; i < separated.Length; i++)
{
    string str = separated[i];
    str = str.Trim();   //trim the tailing spaces

    if (i % 2 == 0 || str.Contains("dark")) // you can expand your condition
    {
       result += str+" ";  // add space after each added string
    }
}
result = result.Trim(); //trim the tailing space again
like image 37
chouaib Avatar answered Oct 14 '22 16:10

chouaib


Another method through regex alternation operator |.

@"('[^']*\bdark\b[^']*')|'[^']*'"

Then replace the matched character with $1

DEMO

string str = "Only 'together' can we turn him to the 'dark side' of the Force";
string result = Regex.Replace(str, @"('[^']*\bdark\b[^']*')|'[^']*'", "$1");
Console.WriteLine(result);

IDEONE

Explanation:

  • (...) called capturing group.

  • '[^']*\bdark\b[^']*' would match all the single quoted strings which contains the substring dark . [^']* matches any character but not of ', zero or more times.

  • ('[^']*\bdark\b[^']*'), because the regex is within a capturing group, all the matched characters are stored inside the group index 1.

  • | Next comes the regex alternation operator.

  • '[^']*' Now this matches all the remaining (except the one contains dark) single quoted strings. Note that this won't match the single quoted string which contains the substring dark because we already matched those strings with the pattern exists before to the | alternation operator.

  • Finally replacing all the matched characters with the chars inside group index 1 will give you the desired output.

like image 42
Avinash Raj Avatar answered Oct 14 '22 18:10

Avinash Raj