Really, pretty much what the title says.
Say you have this string:
var theString = "a=b=c=d";
Now, when you run theString.split("=")
the result is ["a", "b", "c", "d"]
as expected. And, of course, when you run theString.split("=", 2)
you get ["a", "b"]
, which after reading the MDN page for String#split()
makes sense to me.
However, the behavior I'm looking for is more like Java's String#split()
: Instead of building the array normally, then returning the first n elements, it builds an array of the first n-1 matches, then adds all the remaining characters as the last element of the array. See the relevant docs for a better description.
How can I get this effect in Javascript?
I'm looking for the answer with the best performance that works like the Java implementation, though the actual way it works can be different.
I'd post my attempt, but I don't know how to go about writing this at all.
I'd use something like this:
function JavaSplit(string,separator,n) {
var split = string.split(separator);
if (split.length <= n)
return split;
var out = split.slice(0,n-1);
out.push(split.slice(n-1).join(separator));
return out;
}
What we're doing here is:
One might reasonably think you could chain all of those calls together, but .push()
mutates an array rather than returning a new one. It's also a bit easier for you to follow this way.
The answer from Asad is excellent as it allows for variable length RegExp separators (e.g. /\s+/g
, splitting along any length of whitespace, including newlines). However, there are a couple issues with it.
exec
can return null
and cause it to break. This can happen if the separator does not appear in the input string.The following addresses these issues while being just as performant:
/**
* Split a string with a RegExp separator an optionally limited number of times.
* @param {string} input
* @param {RegExp} separator
* @param {number} [limit] - If not included, splits the maximum times
* @returns {string[]}
*/
function split(input, separator, limit) {
// Ensure the separator is global
separator = new RegExp(separator, 'g');
// Allow the limit argument to be excluded
limit = limit ?? -1;
const output = [];
let finalIndex = 0;
while (limit--) {
const lastIndex = separator.lastIndex;
const search = separator.exec(input);
if (search === null) {
break;
}
finalIndex = separator.lastIndex;
output.push(input.slice(lastIndex, search.index));
}
output.push(input.slice(finalIndex));
return output;
}
split("foo bar baz quux", /\s+/, 3)
// ["foo", "bar", "baz", "quux"]
split("foo bar baz quux", /\s+/, 2)
// ["foo", "bar", "baz quux"]
split("foo bar baz quux", /\s+/, 1)
// ["foo", "bar baz quux"]
split("foo bar baz quux", /\s+/, 0)
// ["foo bar baz quux"]
// A higher limit than possible splits
split("foo bar baz quux", /\s+/, 4)
// ["foo", "bar", "baz", "quux"]
// A split that doesn't exist
split("foo bar baz quux", /p/, 2)
// ["foo bar baz quux"]
// Not providing a limit finds the maximum splits
split("foo bar baz quux", /\s+/)
// ["foo", "bar", "baz", "quux"]
Notes:
In production code, it's recommended not to mutate function arguments. Both separator
and limit
are being mutated. You can choose to create new variables at the top of the function to avoid this if desired. I chose not to do this to keep the example code short. This is not production code.
I did not include any defensive code to check the function argument types. This would be a good thing to consider for production code, or consider TypeScript ;)
Originally I threw an Error
if the provided separator did not have the global flag set. See the comments below for reasons why it might be desired to add the global flag for the user instead of throwing. Thank you for the suggestion @Stephen P.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With