Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript Regex - ignoring certain characters between 2 chars

I have a need to split a string on space character (' ') but while excluding any spaces that come within 2 specific characters (say single quotes).

Here is an example string:

This-is-first-token This-is-second-token 'This is third token'

The output array should look like this:

[0] = This-is-first-token
[1] = This-is-second-token
[2] = 'This is third token'

Question: Can this be done elegantly with regular expression?

like image 800
AlvinfromDiaspar Avatar asked Nov 21 '13 06:11

AlvinfromDiaspar


People also ask

How do you match a character except one regex?

To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '. ' (period) is a metacharacter (it sometimes has a special meaning).

What is \d in JavaScript regex?

The RegExp \D Metacharacter in JavaScript is used to search non digit characters i.e all the characters except digits. It is same as [^0-9]. Example 1: This example searches the non-digit characters in the whole string.

What does %s mean in regex?

The Difference Between \s and \s+ For example, expression X+ matches one or more X characters. Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.


2 Answers

Short Answer:

A simple regex for this purpose would be:

/'[^']+'|[^\s]+/g

Sample code:

data = "This-is-first-token This-is-second-token 'This is third token'";
data.match(/'[^']+'|[^\s]+/g);

Result:

["This-is-first-token", "This-is-second-token", "'This is third token'"]

Explanation:

Regular expression visualization

Debuggex Demo

I think this is as simple as you can make it in just a regex.

The g at the end makes it a global match, so you get all three matches. Without it, you get only the first string.

\s matches all whitespace (basically, and tabs, in this instance). So, it would work even if there was a tab between This-is-first-token and This-is-second-token.

To match content in braces, use this:

data.match(/\{[^\}]+\}|[^\s]+/g);

Regular expression visualization

Debuggex Demo

Braces or single quotes:

data.match(/\{[^\}]+\}|'[^']+'|[^\s]+/g);

Regular expression visualization

Debuggex Demo

like image 124
elixenide Avatar answered Sep 17 '22 20:09

elixenide


You can use this split:

var string = "This-is-first-token This-is-second-token 'This is third token'";
var arr = string.split(/(?=(?:(?:[^']*'){2})*[^']*$)\s+/);
//=> ["This-is-first-token", "This-is-second-token", "'This is third token'"]

This assumes quotes are all balanced.

like image 44
anubhava Avatar answered Sep 20 '22 20:09

anubhava