Using C#, I have a string that is a SQL script containing multiple queries. I want to remove sections of the string that are enclosed in single quotes. I can do this using Regex.Replace
, in this manner:
string test = "Only 'together' can we turn him to the 'dark side' of the Force";
test = Regex.Replace(test, "'[^']*'", string.Empty);
Results in: "Only can we turn him to the of the Force"
What I want to do is remove the substrings between quotes EXCEPT for substrings containing a specific substring. For example, using the string above, I want to remove the quoted substrings except for those that contain "dark," such that the resulting string is:
Results in: "Only can we turn him to the 'dark side' of the Force"
How can this be accomplished using Regex.Replace
, or perhaps by some other technique? I'm currently trying a solution that involves using Substring()
, IndexOf()
, and Contains()
.
Note: I don't care if the single quotes around "dark side" are removed or not, so the result could also be: "Only can we turn him to the dark side of the Force." I say this because a solution using Split()
would remove all the single quotes.
Edit: I don't have a solution yet using Substring()
, IndexOf()
, etc. By "working on," I mean I'm thinking in my head how this can be done. I have no code, which is why I haven't posted any yet. Thanks.
Edit: VKS's solution below works. I wasn't escaping the \b the first attempt which is why it failed. Also, it didn't work unless I included the single quotes around the whole string as well.
test = Regex.Replace(test, "'(?![^']*\\bdark\\b)[^']*'", string.Empty);
The first and most commonly used method to remove/replace any substring is the replace() method of Java String class. The first parameter is the substring to be replaced, and the second parameter is the new substring to replace the first parameter.
Use the substr() Function to Remove the Last Characters in R The substr() function in R extracts or replaces a substring from a string. We pass the given string and the starting and final position of the required substring to the function.
There are multiple ways to remove whitespace and other characters from a string in Python. The most commonly known methods are strip() , lstrip() , and rstrip() . Since Python version 3.9, two highly anticipated methods were introduced to remove the prefix or suffix of a string: removeprefix() and removesuffix() .
'(?![^']*\bdark\b)[^']*'
Try this.See demo.Replace by empty string
.You can use lookahead
here to check if ''
contains a word dark
.
https://www.regex101.com/r/rG7gX4/12
While vks's solution works, I'd like to demonstrate a different approach:
string test = "Only 'together' can we turn him to the 'dark side' of the Force";
test = Regex.Replace(test, @"'[^']*'", match => {
if (match.Value.Contains("dark"))
return match.Value;
// You can add more cases here
return string.Empty;
});
Or, if your condition is simple enough:
test = Regex.Replace(test, @"'[^']*'", match => match.Value.Contains("dark")
? match.Value
: string.Empty
);
That is, use a lambda to provide a callback for the replacement. This way, you can run arbitrary logic to replace the string.
some thing like this would work.
you can add all strings you want to keep into the excludedStrings
array
string test = "Only 'together' can we turn him to the 'dark side' of the Force";
var excludedString = new string[] { "dark side" };
int startIndex = 0;
while ((startIndex = test.IndexOf('\'', startIndex)) >= 0)
{
var endIndex = test.IndexOf('\'', startIndex + 1);
var subString = test.Substring(startIndex, (endIndex - startIndex) + 1);
if (!excludedString.Contains(subString.Replace("'", "")))
{
test = test.Remove(startIndex, (endIndex - startIndex) + 1);
}
else
{
startIndex = endIndex + 1;
}
}
I made this attempt that I think you were thinking about (some solution using split
, Contain
, ... without regex
)
string test = "Only 'together' can we turn him to the 'dark side' of the Force";
string[] separated = test.Split('\'');
string result = "";
for (int i = 0; i < separated.Length; i++)
{
string str = separated[i];
str = str.Trim(); //trim the tailing spaces
if (i % 2 == 0 || str.Contains("dark")) // you can expand your condition
{
result += str+" "; // add space after each added string
}
}
result = result.Trim(); //trim the tailing space again
Another method through regex alternation operator |
.
@"('[^']*\bdark\b[^']*')|'[^']*'"
Then replace the matched character with $1
DEMO
string str = "Only 'together' can we turn him to the 'dark side' of the Force";
string result = Regex.Replace(str, @"('[^']*\bdark\b[^']*')|'[^']*'", "$1");
Console.WriteLine(result);
IDEONE
Explanation:
(...)
called capturing group.
'[^']*\bdark\b[^']*'
would match all the single quoted strings which contains the substring dark
. [^']*
matches any character but not of '
, zero or more times.
('[^']*\bdark\b[^']*')
, because the regex is within a capturing group, all the matched characters are stored inside the group index 1.
|
Next comes the regex alternation operator.
'[^']*'
Now this matches all the remaining (except the one contains dark
) single quoted strings. Note that this won't match the single quoted string which contains the substring dark
because we already matched those strings with the pattern exists before to the |
alternation operator.
Finally replacing all the matched characters with the chars inside group index 1 will give you the desired output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With