Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are Java and C# regular expressions compatible?

Tags:

java

c#

.net

regex

Both languages claim to use Perl style regular expressions. If I have one language test a regular expression for validity, will it work in the other? Where do the regular expression syntaxes differ?

The use case here is a C# (.NET) UI talking to an eventual Java back end implementation that will use the regex to match data.

Note that I only need to worry about matching, not about extracting portions of the matched data.

like image 421
TREE Avatar asked Feb 11 '09 20:02

TREE


People also ask

Are Java and C related?

C is a middle-level language as it binds the bridges between machine-level and high-level languages. Java is a high-level language as the translation of Java code takes place into machine language, using a compiler or interpreter. C is only compiled and not interpreted. Java is both compiled and interpreted.

Is Java a C language?

As a result, the development of various languages has been influenced by C language. These languages are C++ (also known as C with classes), C#, Python, Java, JavaScript, Perl, PHP, Verilog, D, Limbo and C shell of Unix etc. Every language uses C language in variable capacity.

Is Java a part of the C family?

A programming language created initially for Sun Microsystems set-top box project. The language later evolved to become Java. A high-level programming language which targets low-level hardware, most commonly used in the programming of FPGAs. It is a rich subset of C.

Why is Java so similar to C?

C# and Java are similar languages that are typed statically, strongly, and manifestly. Both are object-oriented, and designed with semi-interpretation or runtime just-in-time compilation, and both are curly brace languages, like C and C++.


2 Answers

There are quite (a lot of) differences.

Character Class

  1. Character classes subtraction [abc-[cde]]
    • .NET YES (2.0)
    • Java: Emulated via character class intersection and negation: [abc&&[^cde]])
  2. Character classes intersection [abc&&[cde]]
    • .NET: Emulated via character class subtraction and negation: [abc-[^cde]])
    • Java YES
  3. \p{Alpha} POSIX character class
    • .NET NO
    • Java YES (US-ASCII)
  4. Under (?x) mode COMMENTS/IgnorePatternWhitespace, space (U+0020) in character class is significant.
    • .NET YES
    • Java NO
  5. Unicode Category (L, M, N, P, S, Z, C)
    • .NET YES: \p{L} form only
    • Java YES:
      • From Java 5: \pL, \p{L}, \p{IsL}
      • From Java 7: \p{general_category=L}, \p{gc=L}
  6. Unicode Category (Lu, Ll, Lt, ...)
    • .NET YES: \p{Lu} form only
    • Java YES:
      • From Java 5: \p{Lu}, \p{IsLu}
      • From Java 7: \p{general_category=Lu}, \p{gc=Lu}
  7. Unicode Block
    • .NET YES: \p{IsBasicLatin} only. (Supported Named Blocks)
    • Java YES: (name of the block is free-casing)
      • From Java 5: \p{InBasicLatin}
      • From Java 7: \p{block=BasicLatin}, \p{blk=BasicLatin}
  8. Spaces, and underscores allowed in all long block names (e.g. BasicLatin can be written as Basic_Latin or Basic Latin)
    • .NET NO
    • Java YES (Java 5)

Quantifier

  1. ?+, *+, ++ and {m,n}+ (possessive quantifiers)
    • .NET NO
    • Java YES

Quotation

  1. \Q...\E escapes a string of metacharacters
    • .NET NO
    • Java YES
  2. \Q...\E escapes a string of character class metacharacters (in character sets)
    • .NET NO
    • Java YES

Matching construct

  1. Conditional matching (?(?=regex)then|else), (?(regex)then|else), (?(1)then|else) or (?(group)then|else)
    • .NET YES
    • Java NO
  2. Named capturing group and named backreference
    • .NET YES:
      • Capturing group: (?<name>regex) or (?'name'regex)
      • Backreference: \k<name> or \k'name'
    • Java YES (Java 7):
      • Capturing group: (?<name>regex)
      • Backreference: \k<name>
  3. Multiple capturing groups can have the same name
    • .NET YES
    • Java NO (Java 7)
  4. Balancing group definition (?<name1-name2>regex) or (?'name1-name2'subexpression)
    • .NET YES
    • Java NO

Assertions

  1. (?<=text) (positive lookbehind)
    • .NET Variable-width
    • Java Obvious width
  2. (?<!text) (negative lookbehind)
    • .NET Variable-width
    • Java Obvious width

Mode Options/Flags

  1. ExplicitCapture option (?n)
    • .NET YES
    • Java NO

Miscellaneous

  1. (?#comment) inline comments
    • .NET YES
    • Java NO

References

  • regular-expressions.info - Comparison of Different Regex Flavors
  • MSDN Library Reference - .NET Framework 4.5 - Regular Expression Language
  • Pattern (Java Platform SE 7)
like image 124
Drew Noakes Avatar answered Sep 23 '22 16:09

Drew Noakes


Check out: http://www.regular-expressions.info/refflavors.html Plenty of regex info on that site, and there's a nice chart that details the differences between java & .net.

like image 26
Seth Avatar answered Sep 23 '22 16:09

Seth