Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exclude characters from a character class

Tags:

Is there a simple way to match all characters in a class except a certain set of them? For example if in a lanaguage where I can use \w to match the set of all unicode word characters, is there a way to just exclude a character like an underscore "_" from that match?

Only idea that came to mind was to use negative lookahead/behind around each character but that seems more complex than necessary when I effectively just want to match a character against a positive match AND negative match. For example if & was an AND operator I could do this...

^(\w&[^_])+$ 
like image 761
Dan Roberts Avatar asked Jun 26 '13 18:06

Dan Roberts


People also ask

How do you exclude characters in a string?

Using 'str. replace() , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str. replace() method will replace all occurrences of the specific character mentioned.

What is character class in regex?

In the context of regular expressions, a character class is a set of characters enclosed within square brackets. It specifies the characters that will successfully match a single character from a given input string.


1 Answers

It really depends on your regex flavor.

.NET

... provides only one simple character class set operation: subtraction. This is enough for your example, so you can simply use

[\w-[_]] 

If a - is followed by a nested character class, it's subtracted. Simple as that...

Java

... provides a much richer set of character class set operations. In particular you can get the intersection of two sets like [[abc]&&[cde]] (which would give c in this case). Intersection and negation together give you subtraction:

[\w&&[^_]] 

Perl

... supports set operations on extended character classes as an experimental feature (available since Perl 5.18). In particular, you can directly subtract arbitrary character classes:

(?[ \w - [_] ]) 

All other flavors

... (that support lookaheads) allow you to mimic the subtraction by using a negative lookahead:

(?!_)\w 

This first checks that the next character is not a _ and then matches any \w (which can't be _ due to the negative lookahead).

Note that each of these approaches is completely general in that you can subtract two arbitrarily complex character classes.

like image 191
Martin Ender Avatar answered Sep 27 '22 19:09

Martin Ender