I am trying to cut these loops down to optimize some code. I was suggested the sliding window technique, but I cannot seem to make it fit my example.
Everything in the parenthesis I added just to show what type they are. The file.get(..) method returns a byte from the given index in the file. The outer loop can (does usually) iterate over a giant range since these files are quite big. The asciiCombo ranges from 2-8 elements.
Here is a nested loop which I am unsure how to cut down:
for (long i = offsetInBytes; i < (long) file.length; ++i) {
int match = 0;
for (int j = 0; j < (int[]) asciiCombo.length; ++j) {
if (file.get(i + j) == asciiCombo[j]) {
match++;
} else {
break;
}
}
}
Replacing the inner loop with if statement(s) or some collection to look up is essentially the same as the nested loop so that won't do. I have been unable to implement sliding window (not sure we can even).
I have been hitting a stuck place here and would appreciate any help. Thanks!
This is a string search problem.
There is a sliding window technique that works, called the Rabin-Karp algorithm: https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm
It requires a "special" hash function that allows you to update the hash of a sliding window quickly when it advances. I put "special" in quotes, because the most common string hashing method -- polynomial hashing -- actually works.
There are many alternatives, howevever. I like Knuth-Morris-Pratt: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
That one calculates a state machine for the string you're searching for that allows the to examine each byte in the file exactly once.
Boyer-Moore is also popular: https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm
Note, however, that from what I see of your code, and your statement that the string you're looking for is usually only 2-8 bytes long, I wouldn't guess that your choice of search algrorithm is the problem. It looks more likely to me that your implementation of file.get(index)
is slow.
You probably want to use a BufferedInputStream(FileInputStream(...))
instead. This will give you the bytes one at a time in order. Use a string search that works with this restriction. Knuth-Morris-Pratt would be fine, or if the string really is limited to 8 byes, then the whole thing fits into a long
. Use that to directly compare all 8 chars with the preceding 8 from the file with one ==
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With