Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement sliding window or reduce these nested loops?

I am trying to cut these loops down to optimize some code. I was suggested the sliding window technique, but I cannot seem to make it fit my example.

Everything in the parenthesis I added just to show what type they are. The file.get(..) method returns a byte from the given index in the file. The outer loop can (does usually) iterate over a giant range since these files are quite big. The asciiCombo ranges from 2-8 elements.

Here is a nested loop which I am unsure how to cut down:

for (long i = offsetInBytes; i < (long) file.length; ++i) {
        int match = 0;
        for (int j = 0; j < (int[]) asciiCombo.length; ++j) {
            if (file.get(i + j) == asciiCombo[j]) {
                match++;
            } else {
                break;
            }
         }
}

Replacing the inner loop with if statement(s) or some collection to look up is essentially the same as the nested loop so that won't do. I have been unable to implement sliding window (not sure we can even).

I have been hitting a stuck place here and would appreciate any help. Thanks!

like image 991
korvin Avatar asked Nov 22 '19 22:11

korvin


1 Answers

This is a string search problem.

There is a sliding window technique that works, called the Rabin-Karp algorithm: https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm

It requires a "special" hash function that allows you to update the hash of a sliding window quickly when it advances. I put "special" in quotes, because the most common string hashing method -- polynomial hashing -- actually works.

There are many alternatives, howevever. I like Knuth-Morris-Pratt: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

That one calculates a state machine for the string you're searching for that allows the to examine each byte in the file exactly once.

Boyer-Moore is also popular: https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm

Note, however, that from what I see of your code, and your statement that the string you're looking for is usually only 2-8 bytes long, I wouldn't guess that your choice of search algrorithm is the problem. It looks more likely to me that your implementation of file.get(index) is slow.

You probably want to use a BufferedInputStream(FileInputStream(...)) instead. This will give you the bytes one at a time in order. Use a string search that works with this restriction. Knuth-Morris-Pratt would be fine, or if the string really is limited to 8 byes, then the whole thing fits into a long. Use that to directly compare all 8 chars with the preceding 8 from the file with one ==.

like image 171
Matt Timmermans Avatar answered Sep 19 '22 15:09

Matt Timmermans