Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checking whether the number of unique numbers within array exceeds n

Just as title reads, I need to check whether the number of unique entries within array exceeds n.

Array.prototype.some() seems to fit perfectly here, as it will stop cycling through the array right at the moment, positive answer is found, so, please, do not suggest the methods that filter out non-unique records and measure the length of resulting dataset as performance matters here.

So far, I use the following code, to check if there's more than n=2 unique numbers:

const res = [1,1,2,1,1,3,1,1,4,1].some((e,_,s,n=2) => s.indexOf(e) != s.lastIndexOf(e) ? false : n-- ? false : true);

console.log(res);
.as-console-wrapper { min-height: 100%}

And it returns false while there's, obviously 3 unique numbers (2,3,4).

Your help to figure out what's my (stupid) mistake here is much appreciated.

p.s. I'm looking for a pure JS solution

like image 564
Yevgen Gorbunkov Avatar asked Sep 20 '19 14:09

Yevgen Gorbunkov


2 Answers

You can use a Map() with array values as map keys and count as values. Then iterate over map values to find the count of unique numbers. If count exceeds the limit return true, if not return false.

Time complexity is O(n). It can't get better than O(n) because every number in the array must be visited to find the count of unique numbers.

var data = [1, 1, 2, 1, 1, 3, 1, 1, 4, 1];

function exceedsUniqueLimit(limit) {
  var map = new Map();

  for (let value of data) {
    const count = map.get(value);
    if (count) {
      map.set(value, count + 1);
    } else {
      map.set(value, 1);
    }
  }

  var uniqueNumbers = 0;

  for (let count of map.values()) {
    if (count === 1) {
      uniqueNumbers++;
    }

    if (uniqueNumbers > limit) {
      return true;
    }
  }

  return false;
}

console.log(exceedsUniqueLimit(2));
like image 89
Nikhil Avatar answered Sep 22 '22 16:09

Nikhil


To know if a value is unique or duplicate, the whole array needs to be scanned at least once (Well, on a very large array there could be a test to see how many elements there is left to scan, but the overhead for this kind of test will make it slower)

This version uses two Set

function uniqueLimit(data,limit) {
  let
    dup = new Set(),
    unique = new Set(),
    value = null;
  for (let i = 0, len = data.length; i < len; ++i) {
    value = data[i];
    if ( dup.has(value) ) continue;
    if ( unique.has(value) ) {
      dup.add(value);
      unique.delete(value);
      continue;
    }
    unique.add(value);
  }
  return unique.size > limit;
}

I also tried this version, using arrays:

function uniqueLimit(data, limit) {
  let unique=[], dup = [];
  for (let idx = 0, len = data.length; idx < len; ++idx) {
    const value = data[idx];
    if ( dup.indexOf(value) >= 0 ) continue;
    const pos = unique.indexOf(value); // get position of value
    if ( pos >= 0 ) {
      unique.splice(pos,1); // remove value
      dup.push(value);
      continue;
    }
    unique.push(value);
  }
  return unique.length > limit;
};

I tested several of the solutions in this thread, and you can find the result here. If there are only a few unique values, the method by using arrays is the fastest, but if there are many unique values it quickly becomes the slowest, and on large arrays slowest by several magnitudes.

More profiling

I did some more tests with node v12.10.0. The results are normalized after the fastest method for each test.

Worst case scenario: 1000000 entries, all unique:

Set     1.00     // See this answer
Map     1.26     // See answer by Nikhil
Reduce  1.44     // See answer by Bali Balo
Array   Infinity // See this answer

Best case scenario: 1000000 entries, all the same:

Array   1.00
Set     1.16
Map     2.60
Reduce  3.43

Question test case: [1, 1, 2, 1, 1, 3, 1, 1, 4, 1]

Array    1.00
Map      1.29
Set      1.47
Reduce   4.25

Another test case: [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1, 1,1,1,1,1,1,1,3,4,1,1,1,1,1,1,1,2,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,5 ]

Array    1.00
Set      1.13
Map      2.24
Reduce   2.39

Conclusion

The method that uses Set works for both small and large arrays, and performs well regardless of if there are many unique values or not. The version that are using arrays can be faster if there are few unique values, but quickly becomes very slow if there are many unique values.

like image 31
some Avatar answered Sep 22 '22 16:09

some