Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to split string into single words or "quoted words", and want to keep the quotes in the resulting array

Tags:

regex

ruby

csv

I'm trying to split a string like Presentation about "Test Driven Development" into an array like this:

[ 'Presentation',
  'about',
  '"Behavior Driven Development"' ]

I have tried CSV::parse_line(string, col_sep: ' '), but this results in

[ 'Presentation',
  'about',
  'Behavior Driven Development' ] # I'm missing the quotes here

I also tried some regexp magic, but I'm still a beginner and didn't succeed. I guess this is quite simple for a pro, so maybe someone could point me into the right direction? Thanks.

like image 996
Joshua Muheim Avatar asked Jul 19 '12 17:07

Joshua Muheim


2 Answers

You may use the following regular expression split:

str = 'Presentation about "Test Driven Development"'
p str.split(/\s(?=(?:[^"]|"[^"]*")*$)/)
# => ["Presentation", "about", "\"Test Driven Development\""]

It splits if there is a space but only if the text following until the end contains an even number of ". Be aware that this version will only work if all your strings are properly quoted.

An alternative solution uses scan to read the parts of the string (besides spaces):

p str.scan(/(?:\w|"[^"]*")+/)
# => ["Presentation", "about", "\"Test Driven Development\""]
like image 57
Howard Avatar answered Nov 10 '22 00:11

Howard


Just to extend the previous answer from Howard, you can add this method:

class String
  def tokenize
    self.
      split(/\s(?=(?:[^'"]|'[^']*'|"[^"]*")*$)/).
      select {|s| not s.empty? }.
      map {|s| s.gsub(/(^ +)|( +$)|(^["']+)|(["']+$)/,'')}
  end
end

And the result:

> 'Presentation      about "Test Driven Development"  '.tokenize
=> ["Presentation", "about", "Test Driven Development"]
like image 26
Keymon Avatar answered Nov 09 '22 23:11

Keymon