Just as title reads, I need to check whether the number of unique entries within array exceeds n.
Array.prototype.some()
seems to fit perfectly here, as it will stop cycling through the array right at the moment, positive answer is found, so, please, do not suggest the methods that filter out non-unique records and measure the length of resulting dataset as performance matters here.
So far, I use the following code, to check if there's more than n=2
unique numbers:
const res = [1,1,2,1,1,3,1,1,4,1].some((e,_,s,n=2) => s.indexOf(e) != s.lastIndexOf(e) ? false : n-- ? false : true);
console.log(res);
.as-console-wrapper { min-height: 100%}
And it returns false
while there's, obviously 3 unique numbers (2,3,4).
Your help to figure out what's my (stupid) mistake here is much appreciated.
p.s. I'm looking for a pure JS solution
You can use a Map() with array values as map keys and count as values. Then iterate over map values to find the count of unique numbers. If count exceeds the limit return true
, if not return false
.
Time complexity is O(n)
. It can't get better than O(n)
because every number in the array must be visited to find the count of unique numbers.
var data = [1, 1, 2, 1, 1, 3, 1, 1, 4, 1];
function exceedsUniqueLimit(limit) {
var map = new Map();
for (let value of data) {
const count = map.get(value);
if (count) {
map.set(value, count + 1);
} else {
map.set(value, 1);
}
}
var uniqueNumbers = 0;
for (let count of map.values()) {
if (count === 1) {
uniqueNumbers++;
}
if (uniqueNumbers > limit) {
return true;
}
}
return false;
}
console.log(exceedsUniqueLimit(2));
To know if a value is unique or duplicate, the whole array needs to be scanned at least once (Well, on a very large array there could be a test to see how many elements there is left to scan, but the overhead for this kind of test will make it slower)
This version uses two Set
function uniqueLimit(data,limit) {
let
dup = new Set(),
unique = new Set(),
value = null;
for (let i = 0, len = data.length; i < len; ++i) {
value = data[i];
if ( dup.has(value) ) continue;
if ( unique.has(value) ) {
dup.add(value);
unique.delete(value);
continue;
}
unique.add(value);
}
return unique.size > limit;
}
I also tried this version, using arrays:
function uniqueLimit(data, limit) {
let unique=[], dup = [];
for (let idx = 0, len = data.length; idx < len; ++idx) {
const value = data[idx];
if ( dup.indexOf(value) >= 0 ) continue;
const pos = unique.indexOf(value); // get position of value
if ( pos >= 0 ) {
unique.splice(pos,1); // remove value
dup.push(value);
continue;
}
unique.push(value);
}
return unique.length > limit;
};
I tested several of the solutions in this thread, and you can find the result here. If there are only a few unique values, the method by using arrays is the fastest, but if there are many unique values it quickly becomes the slowest, and on large arrays slowest by several magnitudes.
More profiling
I did some more tests with node v12.10.0. The results are normalized after the fastest method for each test.
Worst case scenario: 1000000 entries, all unique:
Set 1.00 // See this answer
Map 1.26 // See answer by Nikhil
Reduce 1.44 // See answer by Bali Balo
Array Infinity // See this answer
Best case scenario: 1000000 entries, all the same:
Array 1.00
Set 1.16
Map 2.60
Reduce 3.43
Question test case: [1, 1, 2, 1, 1, 3, 1, 1, 4, 1]
Array 1.00
Map 1.29
Set 1.47
Reduce 4.25
Another test case: [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1, 1,1,1,1,1,1,1,3,4,1,1,1,1,1,1,1,2,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,5 ]
Array 1.00
Set 1.13
Map 2.24
Reduce 2.39
Conclusion
The method that uses Set works for both small and large arrays, and performs well regardless of if there are many unique values or not. The version that are using arrays can be faster if there are few unique values, but quickly becomes very slow if there are many unique values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With