I'm trying to split a string like Presentation about "Test Driven Development" into an array like this:
[ 'Presentation',
'about',
'"Behavior Driven Development"' ]
I have tried CSV::parse_line(string, col_sep: ' '), but this results in
[ 'Presentation',
'about',
'Behavior Driven Development' ] # I'm missing the quotes here
I also tried some regexp magic, but I'm still a beginner and didn't succeed. I guess this is quite simple for a pro, so maybe someone could point me into the right direction? Thanks.
You may use the following regular expression split:
str = 'Presentation about "Test Driven Development"'
p str.split(/\s(?=(?:[^"]|"[^"]*")*$)/)
# => ["Presentation", "about", "\"Test Driven Development\""]
It splits if there is a space but only if the text following until the end contains an even number of ". Be aware that this version will only work if all your strings are properly quoted.
An alternative solution uses scan to read the parts of the string (besides spaces):
p str.scan(/(?:\w|"[^"]*")+/)
# => ["Presentation", "about", "\"Test Driven Development\""]
Just to extend the previous answer from Howard, you can add this method:
class String
def tokenize
self.
split(/\s(?=(?:[^'"]|'[^']*'|"[^"]*")*$)/).
select {|s| not s.empty? }.
map {|s| s.gsub(/(^ +)|( +$)|(^["']+)|(["']+$)/,'')}
end
end
And the result:
> 'Presentation about "Test Driven Development" '.tokenize
=> ["Presentation", "about", "Test Driven Development"]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With