I'm trying to split the string:
"[test| blah] \n [foo |bar bar bar]\n[test| abc |123 | 456 789]"
into the following array:
[
["test","blah"]
["foo","bar bar bar"]
["test","abc","123","456 789"]
]
I tried the following, but it isn't quite right:
"[test| blah] \n [foo |bar bar bar]\n[test| abc |123 | 456 789]"
.scan(/\[(.*?)\s*\|\s*(.*?)\]/)
# =>
# [
# ["test", "blah"]
# ["foo", "bar bar bar"]
# ["test", "abc |123 | 456 789"]
# ]
I need to split at every pipe instead of the first pipe. What would be the correct regular expression to achieve this?
s = "[test| blah] \n [foo |bar bar bar]\n[test| abc |123 | 456 789]"
arr = s.scan(/\[(.*?)\]/).map {|m| m[0].split(/ *\| */)}
Two alternatives:
s = "[test| blah] \n [foo |bar bar bar]\n[test| abc |123 | 456 789]"
s.split(/\s*\n\s*/).map{ |p| p.scan(/[^|\[\]]+/).map(&:strip) }
#=> [["test", "blah"], ["foo", "bar bar bar"], ["test", "abc", "123", "456 789"]]
irb> s.split(/\s*\n\s*/).map do |line|
line.sub(/^\s*\[\s*/,'').sub(/\s*\]\s*$/,'').split(/\s*\|\s*/)
end
#=> [["test", "blah"], ["foo", "bar bar bar"], ["test", "abc", "123", "456 789"]]
Both of them start by splitting on newlines (throwing away surrounding whitespace).
The first one then splits each chunk by looking for anything that is not a [
, |
, or ]
and then throws away extra whitespace (calling strip
on each).
The second one then throws away leading [
and trailing ]
(with whitespace) and then splits on |
(with whitespace).
You cannot get the final result you want with a single scan
. About the closest you can get is this:
s.scan /\[(?:([^|\]]+)\|)*([^|\]]+)\]/
#=> [["test", " blah"], ["foo ", "bar bar bar"], ["123 ", " 456 789"]]
…which drops information, or this:
s.scan /\[((?:[^|\]]+\|)*[^|\]]+)\]/
#=> [["test| blah"], ["foo |bar bar bar"], ["test| abc |123 | 456 789"]]
…which captures the contents of each "array" as a single capture, or this:
s.scan /\[(?:([^|\]]+)\|)?(?:([^|\]]+)\|)?(?:([^|\]]+)\|)?([^|\]]+)\]/
#=> [["test", nil, nil, " blah"], ["foo ", nil, nil, "bar bar bar"], ["test", " abc ", "123 ", " 456 789"]]
…which is hardcoded to a maximum of four items, and inserts nil
entries that you would need to .compact
away.
There is no way to use Ruby's scan
to take a regex like /(?:(aaa)b)+/
and get multiple captures for each time the repetition is matched.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With