Hi I'm stumbled up on a problem related to regular expressions that I cannot resolve.
I need to tokenize the query (split query into parts), suppose the following one as an example:
These are the separate query elements "These are compound composite terms"
What I eventually need is to have an array of 7 tokens:
1) These
2) are
3) the
4) separate
5) query
6) elements
7) These are compound composite term
The seventh token consists of several words because it was inside double quotation marks.
My question is: Is it possible to tokenize the input string accordingly to above explanations using one regular expression?
I was curious about possibility of using Regex.exec
or similar code instead of split
while achieving the same thing, so I've did some investigation that was followed by another question here. And so as a another answer to a question a following regex can be used:
(?:")(?:\w+\W*)+(?:")|\w+
With the following one-liner usage scenario:
var tokens = query.match(/(?:")(?:\w+\W*)+(?:")|\w+/g);
Hope it will be useful...
You can use this regex:
var s = 'These are the separate query elements "These are compound composite term"';
var arr = s.split(/(?=(?:(?:[^"]*"){2})*[^"]*$)\s+/g);
//=> ["These", "are", "the", "separate", "query", "elements", ""These are compound composite term""]
This regex will split on spaces if those are outside double quotes by using a lookahead to make sure there are even number of quotes after space.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With