Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Capturing groups don't work as expected with Ruby scan method

Tags:

regex

ruby

I need to get an array of floats (both positive and negative) from the multiline string. E.g.: -45.124, 1124.325 etc

Here's what I do:

text.scan(/(\+|\-)?\d+(\.\d+)?/)

Although it works fine on regex101 (capturing group 0 matches everything I need), it doesn't work in Ruby code.

Any ideas why it's happening and how I can improve that?

like image 781
Denis Yakovenko Avatar asked Jul 09 '15 13:07

Denis Yakovenko


People also ask

How do Capturing groups work in regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .

What is non capturing group in regex?

Non-capturing groups are important constructs within Java Regular Expressions. They create a sub-pattern that functions as a single unit but does not save the matched character sequence. In this tutorial, we'll explore how to use non-capturing groups in Java Regular Expressions.

What does scan return in Ruby?

Short answer: scan will return all matches. This doesn't make it superior, because if you only want the first match, str.


2 Answers

See scan documentation:

If the pattern contains no groups, each individual result consists of the matched string, $&. If the pattern contains groups, each individual result is itself an array containing one entry per group.

You should remove capturing groups (if they are redundant), or make them non-capturing (if you just need to group a sequence of patterns to be able to quantify them), or use extra code/group in case a capturing group cannot be avoided.

  1. In this scenario, the capturing group is used to quantifiy a pattern sequence, thus all you need to do is convert the capturing group into a non-capturing one by replacing all unescaped ( with (?: (there is only one occurrence here):
text = " -45.124, 1124.325"
puts text.scan(/[+-]?\d+(?:\.\d+)?/)

See demo, output:

-45.124
1124.325

Well, if you need to also match floats like .04 you can use [+-]?\d*\.?\d+. See another demo

  1. There are cases when you cannot get rid of a capturing group, e.g. when the regex contains a backreference to a capturing group. In that case, you may either a) declare a variable to store all matches and collect them all inside a scan block, or b) enclose the whole pattern with another capturing group and map the results to get the first item from each match, c) you may use a gsub with just a regex as a single argument to return an Enumerator, with .to_a to get the array of matches:
text = "11234566666678"
# Variant a:
results = []
text.scan(/(\d)\1+/) { results << Regexp.last_match(0) }
p results                              # => ["11", "666666"]
# Variant b:
p text.scan(/((\d)\2+)/).map(&:first)  # => ["11", "666666"]
# Variant c:
p text.gsub(/(\d)\1+/).to_a  # => ["11", "666666"]

See this Ruby demo.

like image 68
Wiktor Stribiżew Avatar answered Sep 26 '22 11:09

Wiktor Stribiżew


([+-]?\d+\.\d+)

assumes there is a leading digit before the decimal point

see demo at Rubular

like image 27
garyh Avatar answered Sep 25 '22 11:09

garyh