Remove all exclusive Latin characters using regex

Tags:

I'm developing a Portuguese software, so many of my entities have names like 'maça' or 'lição' and I want to use the entity as a resource key. So I want keep every character except the 'ç,ã,õ....'

There is some optimum solution using regex? My actual regex is (as Remove characters using Regex suggest):

Regex regex = new Regex(@"[\W_]+");
string cleanText = regex.Replace(messyText, "").ToUpper();

only to emphasize, I'm worried just with Latin characters.

872

asked Mar 16 '11 19:03

Custodio

1 Answers

I think the best regex would be to use:

[^\x00-\x80]

This is the negation of all ASCII characters. It matches all non-ASCII characters: The \x00 and \x80 (128) is the hexadecimal character code, and - means range. The ^ inside the [ and ] means negation.

Replace them with the empty string, and you should have what you want. It also frees you from worrying about punctuation, and the like, that are not ASCII, and can cause subtle but annoying (and hard to track down) errors.

If you want to use the extended ASCII set as legal characters, you can say \xFF instead of \x80.

165

answered Jan 02 '23 03:01

Ezra

Related questions
                            
                                Can i specify the productversion in a window title?
                            
                                Visual Studio Code Analysis Rule - "Do not expose generic lists"
                            
                                How do I share a constant between C# and C++ code?
                            
                                Simulate button click
                            
                                Opening a Microsoft Word document in a Windows service seems to hang
                            
                                What to use besides enum for c#
                            
                                How to subtract a rectangle from another?
                            
                                How to call an event manually in C#?
                            
                                MVC2 Html.ValidationMessageFor: add htmlAttributes but keep the default message
                            
                                Is it good to have a constructor in abstract class?
                            
                                How to implement single instance per machine application?
                            
                                Finding Out what Interfaces are Queryable for a COM Object?
                            
                                How to convert smallint of t-sql to integer in c#?
                            
                                Is there a Delegate which isn't a MulticastDelegate in C#?
                            
                                How do you persist data to disk from .NET?
                            
                                Compare two Types
                            
                                C# preventing Collection Was Modified exception
                            
                                listen for a key when the application is not focused
                            
                                ASP.NET postbacks creates issue in URL rewriting?
                            
                                InvalidArgument=Value of '0' is not valid for 'SelectedIndex'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Remove all exclusive Latin characters using regex

Tags:

c#

regex

resources

Custodio

People also ask

1 Answers

Ezra

Recent Activity

Donate For Us