Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a string by a character not directly preceded by a character of the same type?

Let's say I have a string: "We.need..to...split.asap". What I would like to do is to split the string by the delimiter ., but I only wish to split by the first . and include any recurring .s in the succeeding token.

Expected output:

["We", "need", ".to", "..split", "asap"]

In other languages, I know that this is possible with a look-behind /(?<!\.)\./ but Javascript unfortunately does not support such a feature.

I am curious to see your answers to this question. Perhaps there is a clever use of look-aheads that presently evades me?

I was considering reversing the string, then re-reversing the tokens, but that seems like too much work for what I am after... plus controversy: How do you reverse a string in place in JavaScript?

Thanks for the help!

like image 440
DRAB Avatar asked Jun 03 '15 02:06

DRAB


3 Answers

Here's a variation of the answer by guest271314 that handles more than two consecutive delimiters:

var text = "We.need.to...split.asap";
var re = /(\.*[^.]+)\./;
var items = text.split(re).filter(function(val) { return val.length > 0; });

It uses the detail that if the split expression includes a capture group, the captured items are included in the returned array. These capture groups are actually the only thing we are interested in; the tokens are all empty strings, which we filter out.

EDIT: Unfortunately there's perhaps one slight bug with this. If the text to be split starts with a delimiter, that will be included in the first token. If that's an issue, it can be remedied with:

var re = /(?:^|(\.*[^.]+))\./;
var items = text.split(re).filter(function(val) { return !!val; });

(I think this regex is ugly and would welcome an improvement.)

like image 94
Ted Hopp Avatar answered Nov 16 '22 13:11

Ted Hopp


You can do this without any lookaheads:

var subject = "We.need.to....split.asap";
var regex = /\.?(\.*[^.]+)/g;

var matches, output = [];

while(matches = regex.exec(subject)) {
    output.push(matches[1]);  
}

document.write(JSON.stringify(output));

It seemed like it'd work in one line, as it did on https://regex101.com/r/cO1dP3/1, but had to be expanded in the code above because the /g option by default prevents capturing groups from returning with .match (i.e. the correct data was in the capturing groups, but we couldn't immediately access them without doing the above).

See: JavaScript Regex Global Match Groups

An alternative solution with the original one liner (plus one line) is:

document.write(JSON.stringify(
    "We.need.to....split.asap".match(/\.?(\.*[^.]+)/g)
        .map(function(s) { return s.replace(/^\./, ''); })
));

Take your pick!

like image 44
Bilal Akil Avatar answered Nov 16 '22 13:11

Bilal Akil


Note: This answer can't handle more than 2 consecutive delimiters, since it was written according to the example in the revision 1 of the question, which was not very clear about such cases.


var text = "We.need.to..split.asap";
// split "." if followed by "."
var res = text.split(/\.(?=\.)/).map(function(val, key) {
  // if `val[0]` does not begin with "." split "."
  // else split "." if not followed by "."
  return val[0] !== "." ? val.split(/\./) : val.split(/\.(?!.*\.)/)
}); 
// concat arrays `res[0]` , `res[1]`
res = res[0].concat(res[1]);

document.write(JSON.stringify(res));
like image 2
guest271314 Avatar answered Nov 16 '22 11:11

guest271314