I'm trying to split a string like Presentation about "Test Driven Development"
into an array like this:
[ 'Presentation',
'about',
'"Behavior Driven Development"' ]
I have tried CSV::parse_line(string, col_sep: ' ')
, but this results in
[ 'Presentation',
'about',
'Behavior Driven Development' ] # I'm missing the quotes here
I also tried some regexp magic, but I'm still a beginner and didn't succeed. I guess this is quite simple for a pro, so maybe someone could point me into the right direction? Thanks.
You may use the following regular expression split
:
str = 'Presentation about "Test Driven Development"'
p str.split(/\s(?=(?:[^"]|"[^"]*")*$)/)
# => ["Presentation", "about", "\"Test Driven Development\""]
It splits if there is a space but only if the text following until the end contains an even number of "
. Be aware that this version will only work if all your strings are properly quoted.
An alternative solution uses scan
to read the parts of the string (besides spaces):
p str.scan(/(?:\w|"[^"]*")+/)
# => ["Presentation", "about", "\"Test Driven Development\""]
Just to extend the previous answer from Howard, you can add this method:
class String
def tokenize
self.
split(/\s(?=(?:[^'"]|'[^']*'|"[^"]*")*$)/).
select {|s| not s.empty? }.
map {|s| s.gsub(/(^ +)|( +$)|(^["']+)|(["']+$)/,'')}
end
end
And the result:
> 'Presentation about "Test Driven Development" '.tokenize
=> ["Presentation", "about", "Test Driven Development"]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With