Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match a long with Java regex?

I know i can match numbers with Pattern.compile("\\d*");

But it doesn't handle the long min/max values.

For performence issues related to exceptions i do not want to try to parse the long unless it is really a long.

if ( LONG_PATTERN.matcher(timestampStr).matches() ) {
    long timeStamp = Long.parseLong(timestampStr);
    return new Date(timeStamp);
} else {
    LOGGER.error("Can't convert " + timestampStr + " to a Date because it is not a timestamp! -> ");
    return null;
}

I mean i do not want any try/catch block and i do not want to get exceptions raised for a long like "564654954654464654654567879865132154778" which is out of the size of a regular Java long.

Does someone has a pattern to handle this kind of need for the primitive java types? Does the JDK provide something to handle it automatically? Is there a fail-safe parsing mecanism in Java?

Thanks


Edit: Please assume that the "bad long string" is not an exceptionnal case. I'm not asking for a benchmark, i'm here for a regex representing a long and nothing more. I'm aware of the additionnal time required by the regex check, but at least my long parsing will always be constant and never be dependent of the % of "bad long strings"

I can't find the link again but there is a nice parsing benchmark on StackOverflow which clearly shows that reusing the sams compiled regex is really fast, a LOT faster than throwing an exception, thus only a small threshold of exceptions whould make the system slower than with the additionnal regex check.

like image 244
Sebastien Lorber Avatar asked Jun 28 '12 11:06

Sebastien Lorber


2 Answers

The minimum avlue of a long is -9,223,372,036,854,775,808, and the maximum value is 9,223,372,036,854,775,807. So, a maximum of 19 digits. So, \d{1,19} should get you there, perhaps with an optional -, and with ^ and $ to match the ends of the string.

So roughly:

Pattern LONG_PATTERN = Pattern.compile("^-?\\d{1,19}$");

...or something along those lines, and assuming you don't allow commas (or have already removed them).

As gexicide points out in the comments, the above allows a small (in comparison) range of invalid values, such as 9,999,999,999,999,999,999. You can get more complex with your regex, or just accept that the above will weed out the vast majority of invalid numbers and so you reduce the number of parsing exceptions you get.

like image 131
T.J. Crowder Avatar answered Sep 18 '22 21:09

T.J. Crowder


This regular expression should do what you need:

^(-9223372036854775808|0)$|^((-?)((?!0)\d{1,18}|[1-8]\d{18}|9[0-1]\d{17}|92[0-1]\d{16}|922[0-2]\d{15}|9223[0-2]\d{14}|92233[0-6]\d{13}|922337[0-1]\d{12}|92233720[0-2]\d{10}|922337203[0-5]\d{9}|9223372036[0-7]\d{8}|92233720368[0-4]\d{7}|922337203685[0-3]\d{6}|9223372036854[0-6]\d{5}|92233720368547[0-6]\d{4}|922337203685477[0-4]\d{3}|9223372036854775[0-7]\d{2}|922337203685477580[0-7]))$

But this regexp doesn't validate additional symbols like +, L, _ and etc. And if you need to validate all possible Long values you need to upgrade this regexp.

like image 23
Aliaksei Mychko Avatar answered Sep 18 '22 21:09

Aliaksei Mychko