Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex needed to split a string by "."

I am in need for a regex in Javascript. I have a string:

'*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5'

I want to split this string by periods such that I get an array:

[
    '*window',
    'some1',
    'some\.2',   //ignore the . because it's escaped
    '(a.b ? cc\.c : d.n [a.b, cc\.c])',  //ignore everything inside ()
    'some\.3',
    '(this.o.p ? ".mike." [ff\.])',
    'some5'
]

What regex will do this?

like image 466
user1031396 Avatar asked Nov 05 '11 20:11

user1031396


People also ask

Can you use regex to split a string?

You do not only have to use literal strings for splitting strings into an array with the split method. You can use regex as breakpoints that match more characters for splitting a string.

What is the use of \\ in regex?

You also need to use regex \\ to match "\" (back-slash). Regex recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.

How do I split a string into string?

The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string. If (" ") is used as separator, the string is split between words.

Is regex faster than string split?

Regex will work faster in execution, however Regex's compile time and setup time will be more in instance creation. But if you keep your regex object ready in the beginning, reusing same regex to do split will be faster. String.


1 Answers

var string = '*window.some1.some\\.2.(a.b + ")" ? cc\\.c : d.n [a.b, cc\\.c]).some\\.3.(this.o.p ? ".mike." [ff\\.]).some5';
var pattern = /(?:\((?:(['"])\)\1|[^)]+?)+\)+|\\\.|[^.]+?)+/g;
var result = string.match(pattern);
result = Array.apply(null, result); //Convert RegExp match to an Array

Fiddle: http://jsfiddle.net/66Zfh/3/
Explanation of the RegExp. Match a consecutive set of characters, satisfying:

/             Start of RegExp literal
(?:            Create a group without reference (example: say, group A)
   \(          `(` character
   (?:         Create a group without reference (example: say, group B)
      (['"])     ONE `'` OR `"`, group 1, referable through `\1` (inside RE)
      \)         `)` character
      \1         The character as matched at group 1, either `'` or `"`
     |          OR
      [^)]+?     Any non-`)` character, at least once (see below)
   )+          End of group (B). Let this group occur at least once
  |           OR
   \\\.        `\.` (escaped backslash and dot, because they're special chars)
  |           OR
   [^.]+?      Any non-`.` character, at least once (see below)
)+            End of group (A). Let this group occur at least once
/g           "End of RegExp, global flag"
        /*Summary: Match everything which is not satisfying the split-by-dot
                 condition as specified by the OP*/

There's a difference between + and +?. A single plus attempts to match as much characters as possible, while a +? matches only these characters which are necessary to get the RegExp match. Example: 123 using \d+? > 1 and \d+ > 123.

The String.match method performs a global match, because of the /g, global flag. The match function with the g flag returns an array consisting of all matches subsequences.

When the g flag is omitted, only the first match will be selected. The array will then consist of the following elements:

Index 0: <Whole match>
Index 1: <Group 1>
like image 76
Rob W Avatar answered Sep 22 '22 07:09

Rob W