Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A javascript regular expression to tokenize the query

Hi I'm stumbled up on a problem related to regular expressions that I cannot resolve.

I need to tokenize the query (split query into parts), suppose the following one as an example:

These are the separate query elements "These are compound composite terms"

What I eventually need is to have an array of 7 tokens:

1) These
2) are
3) the
4) separate
5) query
6) elements
7) These are compound composite term

The seventh token consists of several words because it was inside double quotation marks.

My question is: Is it possible to tokenize the input string accordingly to above explanations using one regular expression?

Edit

I was curious about possibility of using Regex.exec or similar code instead of split while achieving the same thing, so I've did some investigation that was followed by another question here. And so as a another answer to a question a following regex can be used:

(?:")(?:\w+\W*)+(?:")|\w+

With the following one-liner usage scenario:

var tokens = query.match(/(?:")(?:\w+\W*)+(?:")|\w+/g);

Hope it will be useful...

like image 581
Lu4 Avatar asked May 19 '15 14:05

Lu4


1 Answers

You can use this regex:

var s = 'These are the separate query elements "These are compound composite term"';

var arr = s.split(/(?=(?:(?:[^"]*"){2})*[^"]*$)\s+/g); 
//=> ["These", "are", "the", "separate", "query", "elements", ""These are compound composite term""]

This regex will split on spaces if those are outside double quotes by using a lookahead to make sure there are even number of quotes after space.

like image 132
anubhava Avatar answered Sep 30 '22 01:09

anubhava