Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby scan Regular Expression

Tags:

regex

ruby

I'm trying to split the string:

"[test| blah] \n [foo |bar bar bar]\n[test| abc |123 | 456 789]"

into the following array:

[
  ["test","blah"]
  ["foo","bar bar bar"]
  ["test","abc","123","456 789"]
]

I tried the following, but it isn't quite right:

"[test| blah] \n [foo |bar bar bar]\n[test| abc |123 | 456 789]"
.scan(/\[(.*?)\s*\|\s*(.*?)\]/)
# =>
# [
#   ["test", "blah"]
#   ["foo", "bar bar bar"]
#   ["test", "abc |123 | 456 789"]
# ]

I need to split at every pipe instead of the first pipe. What would be the correct regular expression to achieve this?

like image 985
Ryan King Avatar asked Dec 02 '22 23:12

Ryan King


2 Answers

 s = "[test| blah] \n [foo |bar bar bar]\n[test| abc |123 | 456 789]"
 arr = s.scan(/\[(.*?)\]/).map {|m| m[0].split(/ *\| */)}
like image 92
matt Avatar answered Dec 31 '22 02:12

matt


Two alternatives:

s = "[test| blah] \n [foo |bar bar bar]\n[test| abc |123 | 456 789]"

s.split(/\s*\n\s*/).map{ |p| p.scan(/[^|\[\]]+/).map(&:strip) }
#=> [["test", "blah"], ["foo", "bar bar bar"], ["test", "abc", "123", "456 789"]]

irb> s.split(/\s*\n\s*/).map do |line|
  line.sub(/^\s*\[\s*/,'').sub(/\s*\]\s*$/,'').split(/\s*\|\s*/)
end
#=> [["test", "blah"], ["foo", "bar bar bar"], ["test", "abc", "123", "456 789"]]

Both of them start by splitting on newlines (throwing away surrounding whitespace).

The first one then splits each chunk by looking for anything that is not a [, |, or ] and then throws away extra whitespace (calling strip on each).

The second one then throws away leading [ and trailing ] (with whitespace) and then splits on | (with whitespace).


You cannot get the final result you want with a single scan. About the closest you can get is this:

s.scan /\[(?:([^|\]]+)\|)*([^|\]]+)\]/
#=> [["test", " blah"], ["foo ", "bar bar bar"], ["123 ", " 456 789"]]

…which drops information, or this:

s.scan /\[((?:[^|\]]+\|)*[^|\]]+)\]/
#=> [["test| blah"], ["foo |bar bar bar"], ["test| abc |123 | 456 789"]]

…which captures the contents of each "array" as a single capture, or this:

s.scan /\[(?:([^|\]]+)\|)?(?:([^|\]]+)\|)?(?:([^|\]]+)\|)?([^|\]]+)\]/
#=> [["test", nil, nil, " blah"], ["foo ", nil, nil, "bar bar bar"], ["test", " abc ", "123 ", " 456 789"]]

…which is hardcoded to a maximum of four items, and inserts nil entries that you would need to .compact away.

There is no way to use Ruby's scan to take a regex like /(?:(aaa)b)+/ and get multiple captures for each time the repetition is matched.

like image 26
Phrogz Avatar answered Dec 31 '22 03:12

Phrogz