Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A regex for version number parsing

I have a version number of the following form:

version.release.modification

where version, release and modification are either a set of digits or the '*' wildcard character. Additionally, any of these numbers (and any preceding .) may be missing.

So the following are valid and parse as:

1.23.456 = version 1, release 23, modification 456 1.23     = version 1, release 23, any modification 1.23.*   = version 1, release 23, any modification 1.*      = version 1, any release, any modification 1        = version 1, any release, any modification *        = any version, any release, any modification 

But these are not valid:

*.12 *123.1 12* 12.*.34 

Can anyone provide me a not-too-complex regex to validate and retrieve the release, version and modification numbers?

like image 235
Andrew Borley Avatar asked Sep 17 '08 11:09

Andrew Borley


People also ask

What is parsing in regex?

The Parse Regex operator (also called the extract operator) enables users comfortable with regular expression syntax to extract more complex data from log lines. Parse regex can be used, for example, to extract nested fields.

Can regex be used for numbers?

The regex [0-9] matches single-digit numbers 0 to 9. [1-9][0-9] matches double-digit numbers 10 to 99. That's the easy part. Matching the three-digit numbers is a little more complicated, since we need to exclude numbers 256 through 999.

What is regex for numbers?

Definition and Usage. The [0-9] expression is used to find any character between the brackets. The digits inside the brackets can be any numbers or span of numbers from 0 to 9. Tip: Use the [^0-9] expression to find any character that is NOT a digit.

What does ++ mean in regex?

++ From What is double plus in regular expressions? That's a Possessive Quantifier. It basically means that if the regex engine fails matching later, it will not go back and try to undo the matches it made here.


2 Answers

I'd express the format as:

"1-3 dot-separated components, each numeric except that the last one may be *"

As a regexp, that's:

^(\d+\.)?(\d+\.)?(\*|\d+)$ 

[Edit to add: this solution is a concise way to validate, but it has been pointed out that extracting the values requires extra work. It's a matter of taste whether to deal with this by complicating the regexp, or by processing the matched groups.

In my solution, the groups capture the "." characters. This can be dealt with using non-capturing groups as in ajborley's answer.

Also, the rightmost group will capture the last component, even if there are fewer than three components, and so for example a two-component input results in the first and last groups capturing and the middle one undefined. I think this can be dealt with by non-greedy groups where supported.

Perl code to deal with both issues after the regexp could be something like this:

@version = (); @groups = ($1, $2, $3); foreach (@groups) {     next if !defined;     s/\.//;     push @version, $_; } ($major, $minor, $mod) = (@version, "*", "*"); 

Which isn't really any shorter than splitting on "." ]

like image 123
Steve Jessop Avatar answered Oct 25 '22 10:10

Steve Jessop


Use regex and now you have two problems. I would split the thing on dots ("."), then make sure that each part is either a wildcard or set of digits (regex is perfect now). If the thing is valid, you just return correct chunk of the split.

like image 32
Paweł Hajdan Avatar answered Oct 25 '22 11:10

Paweł Hajdan