Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limiting the times that .split() splits, rather than truncating the resulting array

Really, pretty much what the title says.

Say you have this string:

var theString = "a=b=c=d";

Now, when you run theString.split("=") the result is ["a", "b", "c", "d"] as expected. And, of course, when you run theString.split("=", 2) you get ["a", "b"], which after reading the MDN page for String#split() makes sense to me.

However, the behavior I'm looking for is more like Java's String#split(): Instead of building the array normally, then returning the first n elements, it builds an array of the first n-1 matches, then adds all the remaining characters as the last element of the array. See the relevant docs for a better description.

How can I get this effect in Javascript?

I'm looking for the answer with the best performance that works like the Java implementation, though the actual way it works can be different.

I'd post my attempt, but I don't know how to go about writing this at all.

like image 817
Nic Avatar asked May 02 '15 04:05

Nic


Video Answer


2 Answers

I'd use something like this:

function JavaSplit(string,separator,n) {
    var split = string.split(separator);
    if (split.length <= n)
        return split;
    var out = split.slice(0,n-1);
    out.push(split.slice(n-1).join(separator));
    return out;
}

What we're doing here is:

  1. Splitting the string entirely
  2. Taking the first n-1 elements, as described.
  3. Re-joining the remaining elements.
  4. Appending them to the array from step 2 and returning.

One might reasonably think you could chain all of those calls together, but .push() mutates an array rather than returning a new one. It's also a bit easier for you to follow this way.

like image 86
S McCrohan Avatar answered Nov 09 '22 22:11

S McCrohan


The answer from Asad is excellent as it allows for variable length RegExp separators (e.g. /\s+/g, splitting along any length of whitespace, including newlines). However, there are a couple issues with it.

  1. If the separator does not use the global flag, it will break.
  2. The exec can return null and cause it to break. This can happen if the separator does not appear in the input string.
  3. If the limit is larger than the separation points, you end up looping back over the string with likely unintended results.
  4. The limit is required, so no easy way to find the maximum splits.

The following addresses these issues while being just as performant:

/**
 * Split a string with a RegExp separator an optionally limited number of times.
 * @param {string} input
 * @param {RegExp} separator
 * @param {number} [limit] - If not included, splits the maximum times
 * @returns {string[]}
 */
function split(input, separator, limit) {
  // Ensure the separator is global
  separator = new RegExp(separator, 'g');
  // Allow the limit argument to be excluded
  limit = limit ?? -1;

  const output = [];
  let finalIndex = 0;

  while (limit--) {
    const lastIndex = separator.lastIndex;
    const search = separator.exec(input);
    if (search === null) {
        break;
    }
    finalIndex = separator.lastIndex;
    output.push(input.slice(lastIndex, search.index));
  }

  output.push(input.slice(finalIndex));

  return output;
}
split("foo bar baz quux", /\s+/, 3)
// ["foo", "bar", "baz", "quux"]
split("foo bar baz quux", /\s+/, 2)
// ["foo", "bar", "baz quux"]
split("foo bar baz quux", /\s+/, 1)
// ["foo", "bar baz quux"]
split("foo bar baz quux", /\s+/, 0)
// ["foo bar baz quux"]

// A higher limit than possible splits
split("foo bar baz quux", /\s+/, 4)
// ["foo", "bar", "baz", "quux"]

// A split that doesn't exist
split("foo bar baz quux", /p/, 2)
// ["foo bar baz quux"]

// Not providing a limit finds the maximum splits
split("foo bar baz quux", /\s+/)
// ["foo", "bar", "baz", "quux"]

Notes:

In production code, it's recommended not to mutate function arguments. Both separator and limit are being mutated. You can choose to create new variables at the top of the function to avoid this if desired. I chose not to do this to keep the example code short. This is not production code.

I did not include any defensive code to check the function argument types. This would be a good thing to consider for production code, or consider TypeScript ;)

Originally I threw an Error if the provided separator did not have the global flag set. See the comments below for reasons why it might be desired to add the global flag for the user instead of throwing. Thank you for the suggestion @Stephen P.

like image 33
spex Avatar answered Nov 10 '22 00:11

spex