Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Number of pairs that share at least one digit

Tags:

algorithm

You are given n numbers and you have to find the number of pairs such that at least one digit is common in between them.

Eg. For 5 numbers :

2837 2818 654 35 931

Answer : 6

The pairs here are (2837,2818), (2837,35), (2837,931), (2818,931), (654,35), (35,931)

My Attempt : I took a structure which stores the number in decimal, the number in form of its digits in array and number of digits in that number.

Now for each number I hashed that number in array conatining index 0-9 and the checked with all following numbers wether any of their digit is already present.

My attempt is O(n^2), which is slow. Is there another algorithm that will work faster?

like image 668
L.ppt Avatar asked Jul 16 '12 06:07

L.ppt


People also ask

How do you find the number of pairs?

= n(n-1) / 2 which is our formula for the number of pairs needed in at least n statements.

How do you find the number of pairs in an array?

Count pairs in an array that hold i+j= arr[i]+arr[j] Given an array of integers arr[], the task is to count all the pairs (arr[i], arr[j]) such that i + j = arr[i] + arr[j] for all 0 ≤ i < j < n. Note: Pairs (x, y) and (y, x) are considered a single pair.


1 Answers

It is crucial to realize what are the variables and what are the constants here.

Number of digits is a constant (10). So is the number of all sets of digits (1024). So is the number of all pairs of such sets (220, or approximately one million). Let's take advantage of that.

Let's try and preprocess the input into an "equivalent" representation in a data structure whose size is constant (independent of input size). Whatever we then do with that constant size structure is by definition a constant time operation, so the total running time is asymptotically determined by the preprocessing phase only.

Data structure

Create an array of 1024 integers, each bucket (index) corresponding to a set of digits; we want to store the number of input numbers that have exactly that set of digits in each bucket.

For example, 3606 has digits 0, 3, and 6, and so it would be counted in bucket 20 + 23 + 26 = 73.

Algorithm

Preprocessing is obvious. Take the next digit (such as '3'), convert it to its value (such as 3), now compute the corresponding bit (such as 1 << 3) and OR it into a (tentative) bucket index variable; different digits inhabit different bits so every digit combination gets a unique bucket, but we got rid of any repeating digits. Loop like this until encountering the number separator; at that point, the bucket index is final and we can just increment the bucket, reset the bucket index and skip till the next digit.

That's it. All that remains is to count our sheep. Oops. Pairs of sheep.

Compare each buckets with each other bucket (but not itself). Whenever the two indexes share a digit (this can be determined using the & operator), multiply the content of those two buckets together, and add the product to a globally maintained sum.

Compare each bucket also to itself, and add only x * (x - 1) / 2 to the globally maintained sum, where x is the content of the bucket.

That sum is your result.

Performance

Worst case: O(n) where n is input size.

The constant factors are favourable as well. We needed a handful instructions (and a RAM access) per digit or separator; and the constant phase examines one million bucket pairs, spending something like another handful instructions on each pair (no RAM access needed, the data structure is very compact). This is lightning fast.

A theorist will say that this is cheating. We are assuming that there is no upper bound on input length (or we couldn't talk asymptotic complexity at all), and yet we also assume that we can put up to the total input length into an integer variable. Oh, well.

A more practical programmer will note that the algorithm is exponential in alphabet size. We've been lucky; if our words weren't composed of digits but of arbitrary characters except the separator, ours would still be asymptotically a linear time algorithm, but it would be unusably slow for any input whatsoever, compared to a naive algorithm which could easily crunch up to megabytes of input at a time.

like image 83
Jirka Hanika Avatar answered Oct 04 '22 09:10

Jirka Hanika