Regex ignore underscores

Tags:

I have a regex ([-@.\/,':\w]*[\w])* and it matches all words within a text (including punctuated words like I.B.M), but I want to make it exclude underscores and I can't seem to figure out how to do it... I tried adding ^[_] (e.g. (^[_][-@.\/,':\w]*[\w])*) but it just breaks up all the words into letters. I want to preserve the word matching, but I don't want to have words with underscores in them, nor words that are entirely made up of underscores.

Whats the proper way to do this?

P.S.

My app is written in C# (if that makes any difference).
I can't use A-Za-z0-9 because I have to match words regardless of the language (could be Chinese, Russian, Japanese, German, English).

Update
Here is an example:

"I.B.M should be parsed as one word w_o_r_d! Russian should work too: мплекс исторических событий."

The matches should be:

Click to copy

I.B.M.  
should  
be  
parsed  
as  
one  
word  
Russian  
should  
work  
too  
мплекс  
исторических  
событий

Note that w_o_r_d should not get matched.

780

asked Mar 30 '11 23:03

Kiril

1 Answers

Try this instead:

Click to copy

([-@.\/,':\p{L}\p{Nd}]*[\p{L}\p{Nd}])*

The \w class is composed of [\p{L}\p{Nd}\p{Pc}] when you're performing Unicode matching. (Or simply [a-zA-Z0-9] if you're doing non-Unicode matching.)

It's the \p{Pc} Unicode category -- punctuation/connector -- that causes the problem by matching underscores, so we explicitly match against the other categories without including that one.

(Further information here, "Character Classes: Word Character", and here, "Character Classes: Supported Unicode General Categories".)

185

answered Oct 03 '22 06:10

LukeH

Related questions
                            
                                Sending CTRL-S message to a window
                            
                                Is it possible to hide a field (or just manipulate/hide from autocomplete) within its own class?
                            
                                Is there any performance gain by declaring an object outside the loop
                            
                                MongoDB c# : Question about pagination
                            
                                StackOverflowException in XML to C# class
                            
                                Gui freeezing when using threading
                            
                                Do I need to remove this sort of event handler?
                            
                                Generic form of NameValueCollection in .Net
                            
                                C# to Lambda - count decimal places / first significant decimal
                            
                                Access automatic property - c#
                            
                                Can't resolve project reference because of indirect dependency on current target framework
                            
                                Is it sane to use Thread.Sleep(int) in ASP.NET or should I use another method?
                            
                                How to make WPF Combobox's Dropdown stays open & Placement
                            
                                Entity framework autogenerated table names
                            
                                Implementing a generic unmanaged array in C#
                            
                                Performance of C# Lambda versus event handler
                            
                                What factors should I consider when choosing between C# and C++ for an image-processing project?
                            
                                Extension Method for a Collection of Derived Types with Base Type in Method Signature
                            
                                Online updating a C# program
                            
                                Building a Email Sender Service

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regex ignore underscores

Tags:

c#

regex

regex-negation

Kiril

People also ask

1 Answers

LukeH

Recent Activity

Donate For Us