Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

javascript undefined elements occurs during regex split

The following code results in undefined element at the middle

"Hello World\n\nhello world".split(/\n(\n|\t|\s)*?\n/)
"Hello World\n\nhello world".split(/\n(\n|\t|\s)*\n/)

The output is

["Hello World", undefined, "hello world"]

I wanted to split if there were two new line characters with any number of new line or space or tab character between them as long as they are not alphabets or symbols or numbers.

like image 792
Mula Ko Saag Avatar asked Dec 15 '15 15:12

Mula Ko Saag


People also ask

Can you use regex in Split JavaScript?

You do not only have to use literal strings for splitting strings into an array with the split method. You can use regex as breakpoints that match more characters for splitting a string.

How does regex split work?

Split(Char[]) method, except that Regex. Split splits the string at a delimiter determined by a regular expression instead of a set of characters. The string is split as many times as possible. If no delimiter is found, the return value contains one element whose value is the original input string.

Is split faster than regex?

split is faster, but complex separators which might involve look ahead, Regex is only option.

Can we split string using regex?

split(String regex) method splits this string around matches of the given regular expression. This method works in the same way as invoking the method i.e split(String regex, int limit) with the given expression and a limit argument of zero. Therefore, trailing empty strings are not included in the resulting array.


1 Answers

It's because when you use a capture group in a split pattern, javascript includes the content of the capture group in the result. Since, the capture group can't be empty, it is never repeated, that's why you get "undefined" and not an empty string.

To prevent this, use a non-capturing group or a character class:

"Hello World\n\nhello world".split(/\n(?:\n|\t|\s)*\n/)
"Hello World\n\nhello world".split(/\n\s*\n/) # (\t and \n are already included in \s)

Note: if you want to remove leading and trailing spaces too, you can use:

/(?:[^\S\n]*\n){2}\s*/
like image 152
Casimir et Hippolyte Avatar answered Oct 01 '22 21:10

Casimir et Hippolyte