Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Turn on/off greedy-ness in clojure re-patterns

Tags:

regex

clojure

How to turn on/off greedy-ness in clojure re-patterns?

(re-find #"(.+)-(.+)" "hello-world-you") => ["hello-world-you" "hello-world" "you"]

vs

(re-find #"(.+)-(.+)" "hello-world-you") => ["hello-world-you" "hello" "world-you"]
like image 609
claj Avatar asked Jan 12 '12 21:01

claj


People also ask

How do I stop regex greedy?

You make it non-greedy by using ". *?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ". *?" . This means that if for instance nothing comes after the ".

What is non-greedy regex?

A non-greedy match means that the regex engine matches as few characters as possible—so that it still can match the pattern in the given string. For example, the regex 'a+?' will match as few 'a' s as possible in your string 'aaaa' . Thus, it matches the first character 'a' and is done with it.

What makes a search non-greedy?

About Non-Greedy Search The Non-Greedy search makes it possible to identify the target element from a pool of similar applications, matching the attributes you specify. It needs to be included in the top-level tag of a selector. If a generated selector contains the idx attribute, its value is set by default to * .


2 Answers

The ? makes quantifiers, such as +, non-greedy. By default, they are greedy.

  • Greedy: (.+)
  • Non-greedy: (.+?)

By the way, this is just the direct, simple, and to-the-point answer. @fge's answer suggests the better way of doing it. Check it out for future expressions.

like image 102
Brigand Avatar answered Nov 01 '22 15:11

Brigand


Don't use .+, use a complemented character class: this avoids having to care about greediness at all.

You should have used this as a regex: ([^-]+)-([^-]+).

Always make the effort to qualify your input as well as possible. Here you wanted to match everything which is not a dash, once or more, and capture it (([^-]+)), then a dash (-), then (again) everything which is not a dash, once or more, and capture it (([^-]+)).

Relying on quantifiers' (non-)greediness is a fundamental error if you know you can describe your input without relying on it. Not only it is a source of error (as you yourself demonstrate), it is also a hindrance for the regex engine to perform at its maximum efficiency.

like image 41
fge Avatar answered Nov 01 '22 13:11

fge