Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JavaScript: Match string with all required characters

I'm looking for a regular expression pattern to match a string that contains all of a list of required characters.

For example, if my requirements are "abc":

  • Match: "abacus", "back", "cab".
  • Don't match: "abs", "car", "banana".

So far, I've come up with these (non-regex) methods:

function testA(requiredChars, checkString) {
  return requiredChars.split('').every(char => checkString.indexOf(char) !== -1)
}

function testB(requiredChars, checkString) {
  for (let char of requiredChars.split('')) {
    if (checkString.indexOf(char) == -1) return false
  }
  return true
}

tests = [ 'abacus', 'back', 'cab', 'abs', 'car', 'banana' ]
tests.forEach(word => {
	console.log(word, testA('abc', word), testB('abc', word))
})

// abacus true true
// back true true
// cab true true
// abs false false
// car false false
// banana false false

I like that the first one is smaller, but unfortunately the second one is faster. Can this be done faster with regex, or should I just quit while I'm ahead?

like image 847
therks Avatar asked Oct 24 '19 05:10

therks


2 Answers

It can be done with regex, but not faster - you'd have to lookahead for each character, which is ugly:

const dedupe = str => [...new Set(str)].join('');

const test = (requiredChars, checkString) => {
  const requiredPatterns = [...requiredChars].map(char => `(?=.*${char})`);
  const pattern = new RegExp(`^${requiredPatterns.join('')}`);
  return pattern.test(checkString);
};

tests = [ 'abacus', 'back', 'cab', 'abs', 'car', 'banana' ]
tests.forEach(word => {
    console.log(word, test('abc', word))
});

It's not very good. Use your current strategy, except with a Set instead of an array - Set.has is O(1), whereas Array.indexOf and Array.includes are O(n) (as shown in the other answer).

like image 40
CertainPerformance Avatar answered Oct 20 '22 00:10

CertainPerformance


A Set is the ideal structure for quickly testing membership:

const containsChars = (chars, s) => {
  const lookup = new Set(s);
  return [...chars].every(e => lookup.has(e));
};

tests = ['abacus', 'back', 'cab', 'abs', 'car', 'banana'];
tests.forEach(word => console.log(word, containsChars('abc', word)));

This will almost certainly be more efficient than a regex, which is not well-suited for this sort of task. Your existing solutions run in quadratic time O(checkString.length * requiredChars.length) because of nested indexOf calls, which loop over checkString repeatedly for each requiredChars. But constructing a set is a one-time expense, making the overall algorithm O(n).

However, if your input is always tiny, the overhead of allocating memory for the set object on every call will outweigh its benefits. If that's the case, stick to your existing solutions. If you're always comparing against the same requiredChars, you might try building the Set once and passing it in as a parameter.

But if this isn't hot code called often in a loop, avoid premature optimization and choose the solution that you think is readable (although they're all pretty much the same in this case). It's generally counterproductive to over-optimize functions until you've established through profiling that they're a bottleneck.

like image 159
ggorlen Avatar answered Oct 19 '22 22:10

ggorlen