I have a need to split a string on space character (' ') but while excluding any spaces that come within 2 specific characters (say single quotes).
Here is an example string:
This-is-first-token This-is-second-token 'This is third token'
The output array should look like this:
[0] = This-is-first-token
[1] = This-is-second-token
[2] = 'This is third token'
Question: Can this be done elegantly with regular expression?
To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '. ' (period) is a metacharacter (it sometimes has a special meaning).
The RegExp \D Metacharacter in JavaScript is used to search non digit characters i.e all the characters except digits. It is same as [^0-9]. Example 1: This example searches the non-digit characters in the whole string.
The Difference Between \s and \s+ For example, expression X+ matches one or more X characters. Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.
A simple regex for this purpose would be:
/'[^']+'|[^\s]+/g
data = "This-is-first-token This-is-second-token 'This is third token'";
data.match(/'[^']+'|[^\s]+/g);
Result:
["This-is-first-token", "This-is-second-token", "'This is third token'"]
Debuggex Demo
I think this is as simple as you can make it in just a regex.
The g
at the end makes it a global match, so you get all three matches. Without it, you get only the first string.
\s
matches all whitespace (basically, and tabs, in this instance). So, it would work even if there was a tab between
This-is-first-token
and This-is-second-token
.
To match content in braces, use this:
data.match(/\{[^\}]+\}|[^\s]+/g);
Debuggex Demo
Braces or single quotes:
data.match(/\{[^\}]+\}|'[^']+'|[^\s]+/g);
Debuggex Demo
You can use this split:
var string = "This-is-first-token This-is-second-token 'This is third token'";
var arr = string.split(/(?=(?:(?:[^']*'){2})*[^']*$)\s+/);
//=> ["This-is-first-token", "This-is-second-token", "'This is third token'"]
This assumes quotes are all balanced.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With