I'm looking for an algorithm for intersection of two small, unsorted array in very specific condition.
Both brute-force/sort-and-intersect solutions are not that bad, but I don't think that they're fast enough. Is there more optimal solution?
As the arrays are of primitive types, and short enough to be in cache lines, a fast implementation would focus on the tactical mechanics of comparisons rather than the big O complexity e.g. avoid hash-tables as these would generally involve hashing and indirection and would always involve a lot of management overhead.
If you have two sorted arrays, then intersection is O(n+m). You say that sort-then-intersect is 'brute-force' but you can't do it quicker.
If the arrays are stored sorted, of course, you gain further as you say you are calling the intersection often.
The intersection itself can be done with SSE.
Here's a potential optimization: check whether both arrays have max element <=32 (or 64, or maybe even 16). If they do, then fill two bitmaps of that size (of type uint32_t, etc.) and intersect using a binary AND, &. If they're not, resort to sorting.
Or, instead of sorting, use the highly efficient integer set representation due to Briggs and Torczon that allows linear time intersect with O(m + n) construction and O(min(m, n)) intersect. That should be much faster than a hash table with better bounds than sorting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With