Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently counting the number of substrings of a digit string that are divisible by k?

We are given a string which consists of digits 0-9. We have to count number of sub-strings divisible by a number k. One way is to generate all the sub-strings and check if it is divisible by k but this will take O(n^2) time. I want to solve this problem in O(n*k) time.

1 <= n <= 100000 and 2 <= k <= 1000.

I saw a similar question here. But k was fixed as 4 in that question. So, I used the property of divisibility by 4 to solve the problem. Here is my solution to that problem:

int main()
{
    string s;
    vector<int> v[5];
    int i;
    int x;
    long long int cnt = 0;

    cin>>s;
    x = 0;
    for(i = 0; i < s.size(); i++) {
        if((s[i]-'0') % 4 == 0) {
            cnt++;
        }
    }
    for(i = 1; i < s.size(); i++) {
        int f = s[i-1]-'0';
        int s1 = s[i] - '0';
        if((10*f+s1)%4 == 0) {
            cnt = cnt + (long long)(i);
        }
    }
    cout<<cnt;

}

But I wanted a general algorithm for any value of k.

like image 285
Shivam Mitra Avatar asked Feb 22 '16 14:02

Shivam Mitra


1 Answers

This is a really interesting problem. Rather than jumping into the final overall algorithm, I thought I'd start with a reasonable algorithm that doesn't quite cut it, then make a series of modifications to it to end up with the final, O(nk)-time algorithm.

This approach combines together a number of different techniques. The major technique is the idea of computing a rolling remainder over the digits. For example, let's suppose we want to find all prefixes of the string that are multiples of k. We could do this by listing off all the prefixes and checking whether each one is a multiple of k, but that would take time at least Θ(n2) since there are Θ(n2) different prefixes. However, we can do this in time Θ(n) by being a bit more clever. Suppose we know that we've read the first h characters of the string and we know the remainder of the number formed that way. We can use this to say something about the remainder of the first h+1 characters of the string as well, since by appending that digit we're taking the existing number, multiplying it by ten, and then adding in the next digit. This means that if we had a remainder of r, then our new remainder is (10r + d) mod k, where d is the digit that we uncovered.

Here's quick pseudocode to count up the number of prefixes of a string that are multiples of k. It runs in time Θ(n):

remainder = 0
numMultiples = 0
for i = 1 to n: // n is the length of the string
    remainder = (10 * remainder + str[i]) % k
    if remainder == 0
        numMultiples++
return numMultiples

We're going to use this initial approach as a building block for the overall algorithm.

So right now we have an algorithm that can find the number of prefixes of our string that are multiples of k. How might we convert this into an algorithm that finds the number of substrings that are multiples of k? Let's start with an approach that doesn't quite work. What if we count all the prefixes of the original string that are multiples of k, then drop off the first character of the string and count the prefixes of what's left, then drop off the second character and count the prefixes of what's left, etc? This will eventually find every substring, since each substring of the original string is a prefix of some suffix of the string.

Here's some rough pseudocode:

numMultiples = 0
for i = 1 to n:
    remainder = 0
    for j = i to n:
        remainder = (10 * remainder + str[j]) % k
        if remainder == 0
            numMultiples++
return numMultiples

For example, running this approach on the string 14917 looking for multiples of 7 will turn up these strings:

  • String 14917: Finds 14, 1491, 14917
  • String 4917: Finds 49,
  • String 917: Finds 91, 917
  • String 17: Finds nothing
  • String 7: Finds 7

The good news about this approach is that it will find all the substrings that work. The bad news is that it runs in time Θ(n2).

But let's take a look at the strings we're seeing in this example. Look, for example, at the substrings found by searching for prefixes of the entire string. We found three of them: 14, 1491, and 14917. Now, look at the "differences" between those strings:

  • The difference between 14 and 14917 is 917.
  • The difference between 14 and 1491 is 91
  • The difference between 1491 and 14917 is 7.

Notice that the difference of each of these strings is itself a substring of 14917 that's a multiple of 7, and indeed if you look at the other strings that we've matched later on in the run of the algorithm we'll find these other strings as well.

This isn't a coincidence. If you have two numbers with a common prefix that are multiples of the same number k, then the "difference" between them will also be a multiple of k. (It's a good exercise to check the math on this.)

So this suggests another route we can take. Suppose that we find all prefixes of the original string that are multiples of k. If we can find all of them, we can then figure out how many pairwise differences there are among those prefixes and potentially avoid rescanning things multiple times. This won't find everything, necessarily, but it will find all substrings that can be formed by computing the difference of two prefixes. Repeating this over all suffixes - and being careful not to double-count things - could really speed things up.

First, let's imagine that we find r different prefixes of the string that are multiples of k. How many total substrings did we just find if we include differences? Well, we've found k strings, plus one extra string for each (unordered) pair of elements, which works out to k + k(k-1)/2 = k(k+1)/2 total substrings discovered. We still need to make sure we don't double-count things, though.

To see whether we're double-counting something, we can use the following technique. As we compute the rolling remainders along the string, we'll store the remainders we find after each entry. If in the course of computing a rolling remainder we rediscover a remainder we've already computed at some point, we know that the work we're doing is redundant; some previous scan over the string will have already computed this remainder and anything we've discovered from this point forward will have already been found.

Putting these ideas together gives us this pseudocode:

numMultiples = 0
seenRemainders = array of n sets, all initially empty
for i = 1 to n:
    remainder = 0
    prefixesFound = 0
    for j = i to n:
        remainder = (10 * remainder + str[j]) % k
        if seenRemainders[j] contains remainder:
            break
        add remainder to seenRemainders[j]
        if remainder == 0
            prefixesFound++
    numMultiples += prefixesFound * (prefixesFound + 1) / 2
return numMultiples

So how efficient is this? At first glance, this looks like it runs in time O(n2) because of the outer loops, but that's not a tight bound. Notice that each element can only be passed over in the inner loop at most k times, since after that there aren't any remainders that are still free. Therefore, since each element is visited at most O(k) times and there are n total elements, the runtime is O(nk), which meets your runtime requirements.

like image 194
templatetypedef Avatar answered Oct 20 '22 23:10

templatetypedef