Using regular expressions in C#, is there any way to find and remove duplicate words or symbols in a string containing a variety of words and symbols?
Ex.
Initial string of words:
"I like the environment. The environment is good."
Desired string:
"I like the environment. is good"
Duplicates removed: "the", "environment", "."
1) Split input sentence separated by space into words. 2) So to get all those strings together first we will join each string in given list of strings. 3) Now create a dictionary using Counter method having strings as keys and their frequencies as values. 4) Join each words are unique to form single string.
In Excel, there are several ways to filter for unique values—or remove duplicate values: To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates.
As said by others, you need more than a regex to keep track of words:
var words = new HashSet<string>();
string text = "I like the environment. The environment is good.";
text = Regex.Replace(text, "\\w+", m =>
words.Add(m.Value.ToUpperInvariant())
? m.Value
: String.Empty);
This seems to work for me
(\b\S+\b)(?=.*\1)
Matches like so
apple apple orange orange red blue green orange green blue pirates ninjas cowboys ninjas pirates
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With