Using regular expressions in C#, is there any way to find and remove duplicate words or symbols in a string containing a variety of words and symbols? Ex. Initial string of words: "I like the environment. The environment is good." Desired string: "I like the environment. is good" Duplicates removed: "the", "environment", "."

As said by others, you need more than a regex to keep track of words: <pre class="prettyprint"><code>var words = new HashSet<string>(); string text = "I like the environment. The environment is good."; text = Regex.Replace(text, "\\w+", m => words.Add(m.Value.ToUpperInvariant()) ? m.Value : String.Empty); </code></pre>

This seems to work for me <pre class="prettyprint"><code>(\b\S+\b)(?=.*\1) </code></pre> Matches like so <pre class="prettyprint"> apple apple orange orange red blue green orange green blue pirates ninjas cowboys ninjas pirates </pre>

Regular expression to find and remove duplicate words

2 Answers

As said by others, you need more than a regex to keep track of words:

var words = new HashSet<string>();
string text = "I like the environment. The environment is good.";
text = Regex.Replace(text, "\\w+", m =>
                     words.Add(m.Value.ToUpperInvariant())
                         ? m.Value
                         : String.Empty);

147

answered Sep 24 '22 00:09

Per Erik Stendahl

This seems to work for me

(\b\S+\b)(?=.*\1)

Matches like so

apple apple orange  
orange red blue green orange green blue  
pirates ninjas cowboys ninjas pirates

answered Sep 22 '22 00:09

Jeff Atwood

Related questions
                            
                                Retrieve Json data with HttpClient
                            
                                is this overkill for assessing Main(string[] args)
                            
                                Creating HTML from a DataTable using C#
                            
                                Should an interface method return a custom object? [closed]
                            
                                Ionic Zip : Zip file creation from byte[]
                            
                                Deep Copy of a C# Object
                            
                                Better way to remove characters that aren't ASCII 32 to 175 C#
                            
                                New Table row every 4th loop
                            
                                Partial classes in different namespace are not being recognized correctly
                            
                                Why everyone states that SpinLock is faster? [closed]
                            
                                MvvmCross Bind to UIButton.TitleLabel.Text
                            
                                C# equivalent to C++ friend keyword?
                            
                                Dropdown For next 10 years
                            
                                Required value in Html.TextBoxFor
                            
                                Xamarin close Android application on back button
                            
                                Replace \\n with \n in a string in C#
                            
                                Grpc.Core.RpcException method is unimplemented with C# client and Java Server
                            
                                The JSON value could not be converted to System.Nullable[System.Int32]
                            
                                WriteOnly Property or Method?
                            
                                Only supporting users who have Javascript enabled

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regular expression to find and remove duplicate words

Tags:

string

c#

regex

triniMahn

People also ask

2 Answers

Per Erik Stendahl

Jeff Atwood

Recent Activity

Donate For Us