Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to say "match anything until a specific character, then work your way backwards"?

I am often faced with patterns where the part which is interesting is delimited by a specific character, the rest does not matter. A typical example:

/dev/sda1       472437724  231650856 216764652  52% /

I would like to extract 52 (which can also be 9, or 100 - so 1 to 3 digits) by saying "match anything, then when you get to % (which is unique in that line), see before for the matches to extract".

I tried to code this as .*(\d*)%.* but the group is not matched:

  • .* match anything, any number of times
  • % ... until you get to the litteral % (the \d is also matched by .* but my understanding is that once % is matched, the regex engine will work backwards, since it now has an "anchor" on which to analyze what was before -- please tell if this reasoning is incorrect, thank you)
  • (\d*) ... and now before that % you had a (\d*) to match and group
  • .* ... and the rest does not matter (match everything)
like image 569
WoJ Avatar asked Jan 30 '26 06:01

WoJ


2 Answers

Your regex does not work because . matches too much, and the group matches too little. The group \d* can basically match nothing because of the * quantifier, leaving everything matched by the ..

And your description of .* is somewhat incorrect. It actually matches everything until the end, and moves backwards until the thing after it ((\d*).*) matches. For more info, see here.

In fact, I think your text can be matched simply by:

(\d{1,3})%

And getting group 1.

The logic of "keep looking until you find..." is kind of baked into the regex engine, so you don't need to explicitly say .* unless you want it in the match. In this case you just want the number before the % right?

like image 135
Sweeper Avatar answered Feb 01 '26 20:02

Sweeper


If you are just looking to extract just the number then I would use:

import re
pattern = r"\d*(?=%)"
string = "/dev/sda1   472437724  231650856 216764652  52% /"
returnedMatches = re.findall(pattern, string)

The regex expression does a positive look ahead for the special character

like image 42
flokibb Avatar answered Feb 01 '26 18:02

flokibb