I'm configuring a log parsing system (Logstash) that uses Regular expressions to parse logs. I'm trying to parse out a package name and class name from a canonical (i.e. fully qualified) Java class name, but I can't get it quite right.
Here are some sample inputs
UnpackagedClass
somepackage.SomeClass
java.lang.Object
java.util.function.Function
Expected output (capture groups):
UnpackagedClass
somepackage
, SomeClass
java.lang
, Object
java.util.function
, Function
Here is what I tried: ((?:(?:X)\.)*)((?:X))
, where X
is [a-zA-Z_$][a-zA-Z\d_$]*
, the regex for a Java identifier. Fully expanded, it's: ((?:(?:[a-zA-Z_$][a-zA-Z\d_$]*)\.)*)((?:[a-zA-Z_$][a-zA-Z\d_$]*))
. It's close, but there are trailing periods after the package names, that get captured as part of the package names:
UnpackagedClass
somepackage.
, SomeClass
java.lang.
, Object
java.util.function.
, Function
Any suggestions on how I can improve this? Here's a RegExr playground to help you 😊.
The backslash \ is an escape character in Java Strings. That means backslash has a predefined meaning in Java. You have to use double backslash \\ to define a single backslash. If you want to define \w , then you must be using \\w in your regex.
Java provides the java. util. regex package for pattern matching with regular expressions.
util. regex Description. Classes for matching character sequences against patterns specified by regular expressions.
Use: (?:(X(?:\.X)*)\.)?(X)
It will have package name in group 1 (null
if unnamed), and class name in group 2.
See regex101.com for demo.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With