Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is using several '.filter' calls on a big array bad for performance?

I wrote this piece of code to filter an array of words. I wrote a filter function for every type of word I want to filter out and apply them sequentially to the array:

  const wordArray = rawArray.filter(removeNonDomainWords)
                            .filter(removeWordsWithDigits)
                            .filter(removeWordsWithInsideNonWordChars)
                            .filter(removeEmptyWords)
                            .filter(removeSearchTerm, term)
                            .map(word => replaceNonWordCharsFromStartAndEnd(word))

This code iterates over the whole array six times if I am not mistaken.

Wouldn't it be more efficient to write one (more complex, yet still easy in my scenario) filter function that logically combines the filter functions to achieve the same result?

I learned about filter in the context of Functional Programming which is supposed to make my code shorter and faster. That's why I probably didn't question what I was writing, thinking 'I am doing FP, this gotta be good'.

Thanks!

like image 384
Flip Avatar asked Aug 15 '17 11:08

Flip


People also ask

How to use the power automate filter array contains?

This is an example of the Power Automate filter array contains. Here we will see how to use and in Power Automate Filter Array action. In Power Automate, select the Manually triggered Flow, then click on the Next step. Next, we will initialize variable action, then provide the variable name, type as Array, and in value write the below array.

Why is the filter method 77% slower than the for loop?

To be precise, the Filter method is 77% slower than for loop. Why is this? One reason could be that for-loops run synchronously, and the filter method is creating 1 new function for each element in the array. This new function is then put on the call stack, and run one by one by the event loop.

Which is faster for loop or filter in library?

Library: Benchmark.js. To our surprise, for-loops are much faster than the Array.filter method. To be precise, the Filter method is 77% slower than for loop. Why is this? One reason could be that for-loops run synchronously, and the filter method is creating 1 new function for each element in the array.

How to filter array of items based on status is completed?

We will use the SharePoint list called Project management list, and we will filter the array of items based on Status is completed. In Power Automate, select the Manually triggered Flow, then click on the Next step. Next, we will get items from the list, so click on the Next step and select Get items action.


Video Answer


2 Answers

Well, it does iterate six times, but not necessarily on the whole initial array. Each time it is filtered it becomes smaller. It would be more effective to have one filter method, but the difference might not be as great as you expect.

If you still want to use this solution, you can increase the performance by using the most selective (that is the filter that is expected filter out the most) first. That way, the following arrays will be smaller and there will be less to iterate through.

As @Redu points out (in comments) you can chain your filters using the || operator. This will make sure you only do one iteration.


The reason behind this is that Array.prototype.filter returns a new array. Compare this with the Java Stream API, that returns a stream, and thus can go "depth first" through the call list. The down side of this is that you need a terminal operation in the end, to "collect" your result.

In javascript

rawArray.filter(x)

iterates the rawArray and returns a new filtered array - which can in turn be filtered, or used as it is. It will result in a call to x for each of the elements in rawArray.

In Java the equivalent would be

rawArray.stream().filter(x)

which would actually not do anything at all at this point. No calls to x would be done. The return value would be a Stream, that can be used later. It can be further filtered, but it is not until the values are collected in some way - with a terminal operation - that calls are made.

Lets compare javascript

rawArray.filter(x).filter(y).length

to Java

rawArray.stream().filter(x).filter(y).count()

In javascript, this would first iterate over all of the elements of rawArray, calling x for each of them, and store the result in an intermediate array. Then the javascript engine would iterate over all of the elements of the intermediate array, calling y for each element, and store the result in a second intermediate array, which it would then check the size of.

In Java, the snippet would result in the VM iterating over the elements of rawArray, first calling x, and, if x is true, then calling y on each element, and, if still true incrementing the counter. There would be no intermediate arrays, and only one iteration over the dataset.

Functional programming is interesting, and when used properly, it creates less code that is less complex and ideally perhaps even a bit easier to read, but it does hand over a lot of responsibility to the framework (or engine or VM or whatever), and it is important to realize that seemingly similar code, while behaving similarly, can perform vastly differently in different environments.

like image 162
Bex Avatar answered Nov 03 '22 20:11

Bex


As others have answered: your chain will loop over the (filtered) data for every .filter call, which can hurt performance, but probably won't, unless you're modifying/filtering thousands of strings.

If you don't want to compromise on performance nor readability, you could create your own filter wrapper that supports both chaining and lazy evaluation.

The example below shows a wrapper that "remembers" the filter method you pass to it, but only calls them once you say you're done.

const ArrayFilter = (arr, pred) => ({
  filter: pred
    ? (newPred) => ArrayFilter(arr, x => pred(x) && newPred(x))
    : (newPred) => ArrayFilter(arr, newPred),
  execute: (mapper) => mapper 
    ? arr.filter(pred)
    : arr.filter(pred).map(mapper)
});


// Some test data
const oneToFifty = Array.from(Array(50), (_, i) => i);

// ArrayFilter allows for a chained syntax, but only loops over the data
// once and only runs filters when needed
console.log(
  ArrayFilter(oneToFifty)
    .filter(x => x % 3 === 0)
    .filter(x => x < 25)
    .filter(x => x > 10)
    .execute(n => `Result: ${n}`)
);

Edit, to be completely honest, it's actually quite hard to get a noticeable performance gain using the lazy filter I've shown above... In the snippet below, I checked two types of filters against an array of 1.000.000 random numbers. For almost all of the cases, the "naive" approach of just chaining Array#filter beats my custom wrapper.

General rule: the lazy execute approach only makes sense for filter collections that remove only a few elements from large sets of data.

If anybody can think of other test cases to throw into the mix, I'd be very interested to try the out!

const ArrayFilter = (arr, pred) => ({
  filter: pred
    ? (newPred) => ArrayFilter(arr, x => pred(x) && newPred(x))
    : (newPred) => ArrayFilter(arr, newPred),
  execute: (mapper) => mapper 
    ? arr.filter(pred)
    : arr.filter(pred).map(mapper)
});


// Some test data, with at least one 0
const million = [0].concat(
  Array.from(Array(999999), Math.random)
);

// Filters that keep removing approx
// 100.000 values until 0
const minusNFilters = [
  x => x <= 0.9,
  x => x <= 0.7,
  x => x <= 0.6,
  x => x <= 0.5,
  x => x <= 0.4,
  x => x <= 0.3,
  x => x <= 0.2,
  x => x <= 0.1,
  x => x <= 0.0
];

// Filters that keep removing 50% of the remaining
// items
const fiftyPFilters = [
  x => x <= 1 / 2,
  x => x <= 1 / 4,
  x => x <= 1 / 8,
  x => x <= 1 / 16,
  x => x <= 1 / 32,
  x => x <= 1 / 64,
  x => x <= 1 / 128,
  x => x <= 1 / 256,
  // Final state
  x => x <= 0
];

const map = x => `Result: ${x}`;

const timed = (label, fn) => {
  console.time(label);
  fn();
  console.timeEnd(label);
};

// Some tests
// For these filters, the ArrayFilter tends to
// be just a bit faster
timed(
  "method   : Regular\nsize     : 1000000 items\nfilters  : -n\nms       ",
  () => minusNFilters
    .reduce((xs, f) => xs.filter(f), million)
    .map(map)
);

timed(
  "method   : ArrayFilter\nsize     : 1000000 items\nfilters  : -n\nms       ",
  () => minusNFilters
    .reduce(
      (af, f) => af.filter(f), 
      ArrayFilter(million))
    .execute(map)
);

// For these filters, the ArrayFilter is usually
// slower
timed(
  "method   : Regular\nsize     : 1000000 items\nfilters  : -50%\nms       ",
  () => fiftyPFilters
    .reduce((xs, f) => xs.filter(f), million)
    .map(map)
);

timed(
  "method   : ArrayFilter\nsize     : 1000000 items\nfilters  : -50%\nms       ",
  () => fiftyPFilters
    .reduce(
      (af, f) => af.filter(f), 
      ArrayFilter(million))
    .execute(map)
);
.as-console-wrapper { min-height: 100% }
like image 44
user3297291 Avatar answered Nov 03 '22 20:11

user3297291