Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby String#scan equivalent to return MatchData

As basically stated in the question title, is there a method on Ruby strings that is the equivalent to String#Scan but instead of returning just a list of each match, it would return an array of MatchDatas? For example:

# Matches a set of characters between underscore pairs
"foo _bar_ _baz_ hashbang".some_method(/_[^_]+_/) #=> [#&ltMatchData "_bar_"&rt, &ltMatchData "_baz_"&rt]

Or any way I could get the same or similar result would be good. I would like to do this to find the positions and extents of "strings" within Ruby strings, e.g. "goodbye and "world" inside "'goodbye' cruel 'world'".

like image 919
Jwosty Avatar asked Mar 02 '12 04:03

Jwosty


3 Answers

You could easily build your own by exploiting MatchData#end and the pos parameter of String#match. Something like this:

def matches(s, re)
  start_at = 0
  matches  = [ ]
  while(m = s.match(re, start_at))
    matches.push(m)
    start_at = m.end(0)
  end
  matches
end

And then:

>> matches("foo _bar_ _baz_ hashbang", /_[^_]+_/)
=> [#<MatchData "_bar_">, #<MatchData "_baz_">]
>> matches("_a_b_c_", /_[^_]+_/)
=> [#<MatchData "_a_">, #<MatchData "_c_">]
>> matches("_a_b_c_", /_([^_]+)_/)
=> [#<MatchData "_a_" 1:"a">, #<MatchData "_c_" 1:"c">]
>> matches("pancakes", /_[^_]+_/)
=> []

You could monkey patch that into String if you really wanted to.

like image 191
mu is too short Avatar answered Nov 14 '22 10:11

mu is too short


If you don't need to get MatchDatas back, here's a way using StringScanner.

require 'strscan'

rxp = /_[^_]+_/
scanner = StringScanner.new "foo _barrrr_ _baz_ hashbang"
match_infos = []
until scanner.eos?
  scanner.scan_until rxp
  if scanner.matched?
    match_infos << {
      pos: scanner.pre_match.size,
      length: scanner.matched_size,
      match: scanner.matched
    }
  else
    break
  end
end

p match_infos
# [{:pos=>4, :length=>8, :match=>"_barrrr_"}, {:pos=>13, :length=>5, :match=>"_baz_"}]
like image 35
Kelvin Avatar answered Nov 14 '22 11:11

Kelvin


memo = []
"foo _bar_ _baz_ hashbang".scan(/_[^_]+_/) { memo << Regexp.last_match }
 => "foo _bar_ _baz_ hashbang"
memo
 => [#<MatchData "_bar_">, #<MatchData "_baz_">]
like image 42
Nash Bridges Avatar answered Nov 14 '22 11:11

Nash Bridges