Bucket sort to find nearby almost duplicates

Question

I am working on problem Contains Duplicate III - LeetCode

Given an array of integers, find out whether there are two distinct indices i and j in the array such that the absolute difference between nums[i] and nums[j]**is at most t and the **absolute difference between i and j is at most k.

Example 1:
Input: nums = [1,2,3,1], k = 3, t = 0
Output: true
Example 2:
Input: nums = [1,0,1,1], k = 1, t = 2
Output: true
Example 3:
Input: nums = [1,5,9,1,5,9], k = 2, t = 3
Output: false

Read a bucket sort solution from the discussion area

class Solution2:
    def containsNearbyAlmostDuplicate(self, nums, k, t):
        if t < 0: return False
        lookup = {}
        for i in range(len(nums)):
            b_idx = nums[i] // (t+1) 
            if b_idx in lookup:
                return True
            if b_idx - 1 in lookup and abs(nums[i] - lookup[b_idx - 1]) < t+1:
                return True
            if b_idx + 1 in lookup and abs(nums[i] - lookup[b_idx + 1]) < t+1:
                return True
            lookup[b_idx] = nums[i] 
            if i >= k: del lookup[nums[i-k] // (t+1)]
        return False

The explaination

The idea is like the bucket sort algorithm. Suppose we have consecutive buckets covering the range of nums with each bucket a width of (t+1). If there are two item with difference <= t, one of the two will happen:
(1) the two in the same bucket
(2) the two in neighbor buckets

I know the logic of bucket sort, but have no ideas how this solution works.

I think, it is necessary to compare between all the values with range of width, but the solution only compare the same bucket and the adjacent bucket,

del lookup[num[i-k], I cannot figure out what the purpose of this manipulation.

recnac · Accepted Answer

First Question: why only compare the same bucket and the adjacent bucket?

As author said, there are two situations if (a, b) is a valid pair:

(1) the two in the same bucket
(2) the two in neighbor buckets

If b - a <= t then only two situations as above said, you can understand it by bucket examples here:

<--a-- t+1 ---b-><----- t+1 -----> in same bucket
<----- t+1 --a--><---b- t+1 -----> in neighbor buckets

Bucket is used here is because we want to divide the range into pariticular width, and decrease the compare times. This is a method of trading space for time.

Second Question: why del lookup[num[i-k]?

Because the second restrict is the difference index i, j should at most k.

So in for i in range(len(nums)): we should remove the previous index j from bucket, if i - j == k. And difference equal k is included, so we should remove after logic.

If you don't do that, you will find the pairs which abs(nums[i]-nums[j])<=t, but abs(i-j)>t

I hope I make it clear, and comment if you have further questions. : )

By the way, an advice: if you are confused or stucked, you can go through a example by print or debug, you will be more clear, especially for corner case.

Bucket sort to find nearby almost duplicates

Tags:

Alice

1 Answers

recnac

Recent Activity

Donate For Us

Bucket sort to find nearby almost duplicates

Tags:

Alice

1 Answers

recnac

Related questions

Recent Activity

Donate For Us