Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Character class subtraction, converting from Java syntax to RegexBuddy

Which regular expression engine does Java uses?

In a tool like RegexBuddy if I use

[a-z&&[^bc]]

that expression in Java is good but in RegexBuddy it has not been understood.

In fact it reports:

Match a single character present in the list below [a-z&&[^bc]

  • A character in the range between a and z : a-z
  • One of the characters &[^bc : &&[^bc
  • Match the character ] literally : ]

but i want to match a character between a and z intersected with a character that is not b or c

like image 389
xdevel2000 Avatar asked Jul 08 '10 08:07

xdevel2000


People also ask

How to match Unicode characters in regex?

To match a specific Unicode code point, use \uFFFF where FFFF is the hexadecimal number of the code point you want to match. You must always specify 4 hexadecimal digits E.g. \u00E0 matches à, but only when encoded as a single code point U+00E0.

How do you negate a regular expression in Java?

Negation: “[^]” It defines the symbol as the negation variant of the character class. It matches all the characters that are not specified in the character class in regex in java. (eg) (i).

What is\ p regex?

The P is Python identifier for a named capture group. You will see P in regex used in jdango and other python based regex implementations.


2 Answers

Like most regex flavors, java.util.regex.Pattern has its own specific features with syntax that may not be fully compatible with others; this includes character class union, intersection and subtraction:

  • [a-d[m-p]] : a through d, or m through p: [a-dm-p] (union)
  • [a-z&&[def]] : d, e, or f (intersection)
  • [a-z&&[^bc]] : a through z, except for b and c: [ad-z] (subtraction)

The most important "caveat" of Java regex is that matches attempts to match a pattern against the whole string. This is atypical of most engines, and can be a source of confusion at times.

See also

  • regular-expressions.info/Flavor Comparison and Java Flavor Notes

On character class subtraction

Subtraction allows you to define for example "all consonants" in Java as [a-z&&[^aeiou]].

This syntax is specific to Java. In XML Schema, .NET, JGSoft and RegexBuddy, it's [a-z-[aeiou]]. Other flavors may not support this feature at all.

References

  • regular-expressions.info/Character Classes in XML Regular Expressions
  • MSDN - Regular Expression Character Classes - Subtraction

Related questions

  • What is the point behind character class intersections in Java’s Regex?
like image 124
polygenelubricants Avatar answered Sep 30 '22 05:09

polygenelubricants


Java uses its own regular expression engine, which behaviour is defined in the Pattern class.

You can test it with an Eclipse plugin or online.

like image 45
Riduidel Avatar answered Sep 30 '22 06:09

Riduidel