I need to split a string into a list of parts in Ruby, but I need to ignore stuff inside paramentheses. For example:
A +4, B +6, C (hello, goodbye) +5, D +3
I'd like the resulting list to be:
[0]A +4
[1]B +6
[2]C (hello, goodbye) +5
[3]D +3
But I can't simply split on commas, because that would split the contents of the parentheses. Is there a way to split stuff out without pre-parsing the commas in the braces into something else?
Thanks.
Try this:
s = 'A +4, B +6, C (hello, goodbye) +5, D +3'
tokens = s.scan(/(?:\(.*?\)|[^,])+/)
tokens.each {|t| puts t.strip}
Output:
A +4
B +6
C (hello, goodbye) +5
D +3
A short explanation:
(?: # open non-capturing group 1
\( # match '('
.*? # reluctatly match zero or more character other than line breaks
\) # match ')'
| # OR
[^,] # match something other than a comma
)+ # close non-capturing group 1 and repeat it one or more times
Another option is to split on a comma followed by some spaces only when the first parenthesis that can be seen when looking ahead is an opening parenthesis (or no parenthesis at all: ie. the end of the string):
s = 'A +4, B +6, C (hello, goodbye) +5, D +3'
tokens = s.split(/,\s*(?=[^()]*(?:\(|$))/)
tokens.each {|t| puts t}
will produce the same output, but I find the scan
method cleaner.
string = "A +4, B +6, C (hello, goodbye) +5, D +3"
string.split(/ *, *(?=[^\)]*?(?:\(|$))/)
# => ["A +4", "B +6", "C (hello, goodbye) +5", "D +3"]
How this regex works:
/
*, * # find comma, ignoring leading and trailing spaces.
(?= # (Pattern in here is matched against but is not returned as part of the match.)
[^\)]*? # optionally, find a sequence of zero or more characters that are not ')'
(?: # <non-capturing parentheses group>
\( # left paren ')'
| # - OR -
$ # (end of string)
)
)
/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With