Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle the different dialects of regular expressions (java vs. xsd)?

Tags:

java

regex

xsd

When I try to validate an XML file against an XSD in java (see this example) there are some incompatibilities between the regular expressions given in the XSD file and the regular expressions in java.

If there is an regular expression like "[ab-]" in the XSD (meaning any of the characters "a", "b" or "-", java complains about a syntax error in the expression.

This is a known bug since 28-MAR-2005, see Sun bug database.

What can I do to work around this bug? Up to now I try to "correct" the XSD file by replacing the "[ab-]" by "[ab\-]", but sometimes this is not an option.


If you have problems with this bug, too, please vote for it at the Sun bug database!

like image 968
tangens Avatar asked Jan 30 '10 16:01

tangens


People also ask

What are regular expressions in Java?

A regular expression is a sequence of characters that forms a search pattern. When you search for data in a text, you can use this search pattern to describe what you are searching for. A regular expression can be a single character, or a more complicated pattern.

Does XML support regex?

XML schema always implicitly anchors the entire regular expression. The regex must match the whole element for the element to be considered valid. If you have the pattern regexp, the XML schema validator will apply it in the same way as say Perl, Java or . NET would do with the pattern ^regexp$.

What is the meaning of +$ in regex?

The usual context of wildcard characters is in globbing similar names in a list of files, whereas regexes are usually employed in applications that pattern-match text strings in general. For example, the regex ^[ \t]+|[ \t]+$ matches excess whitespace at the beginning or end of a line.

Why * is used in regex?

- a "dot" indicates any character. * - means "0 or more instances of the preceding regex token"


1 Answers

Since a bug is already filed, I'd recommend you try a different XML Schema processor. There's not going to be a lot you can do about it.

If you can preprocess the stream the XSD is coming in on, then you could create a parser which understands the basic regular expression structure and can fix anything that looks of the form [.*-] (where the .star is not a literal in this case).

like image 57
Kaleb Pederson Avatar answered Oct 20 '22 21:10

Kaleb Pederson