Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a string by whitespace, keeping quoted segments, allowing escaped quotes

I currently have this regular expression to split strings by all whitespace, unless it's in a quoted segment:

keywords = 'pop rock "hard rock"'; keywords = keywords.match(/\w+|"[^"]+"/g); console.log(keywords); // [pop, rock, "hard rock"] 

However, I also want it to be possible to have quotes in keywords, like this:

keywords = 'pop rock "hard rock" "\"dream\" pop"'; 

This should return

[pop, rock, "hard rock", "\"dream\" pop"] 

What's the easiest way to achieve this?

like image 322
Blaise Avatar asked Oct 27 '10 09:10

Blaise


2 Answers

You can change your regex to:

keywords = keywords.match(/\w+|"(?:\\"|[^"])+"/g); 

Instead of [^"]+ you've got (?:\\"|[^"])+ which allows \" or other character, but not an unescaped quote.

One important note is that if you want the string to include a literal slash, it should be:

keywords = 'pop rock "hard rock" "\\"dream\\" pop"'; //note the escaped slashes. 

Also, there's a slight inconsistency between \w+ and [^"]+ - for example, it will match the word "ab*d", but not ab*d (without quotes). Consider using [^"\s]+ instead, that will match non-spaces.

like image 68
Kobi Avatar answered Sep 22 '22 04:09

Kobi


ES6 solution supporting:

  • Split by space except for inside quotes
  • Removing quotes but not for backslash escaped quotes
  • Escaped quote become quote
  • Can put quotes anywhere

Code:

keywords.match(/\\?.|^$/g).reduce((p, c) => {         if(c === '"'){             p.quote ^= 1;         }else if(!p.quote && c === ' '){             p.a.push('');         }else{             p.a[p.a.length-1] += c.replace(/\\(.)/,"$1");         }         return  p;     }, {a: ['']}).a 

Output:

[ 'pop', 'rock', 'hard rock', '"dream" pop' ] 
like image 32
Tsuneo Yoshioka Avatar answered Sep 26 '22 04:09

Tsuneo Yoshioka