I recently learned about the rolling hash data structure, and basically one of its prime uses to searching for a substring within a string. Here are some advantages that I noticed: <ul> <li>Comparing two strings can be expensive so this should be avoided if possible</li> <li>Hashing the strings and comparing the hashes is generally much faster than comparing strings, however rehashing the new substring each time traditionally takes linear time</li> <li>A rolling hash is able to rehash the new substring in constant time, making it much quicker and more efficient for this task</li> </ul> I went ahead and implemented a rolling hash in JavaScript and began to analyze the speed between a rolling hash, traditional rehashing, and just comparing the substrings against each other. In my findings, the larger the substring, the longer it took for the traditional rehashing approach to run (as expected) where the rolling hash ran incredibly fast (as expected). However, comparing the substrings together ran much faster than the rolling hash. How could this be? For the sake of perspective, let's say the running times for the functions searching through a ~2.4 million character string for a 100 character substring were the following: <ul> <li> Rolling Hash - 0.809 seconds</li> <li> Traditional Rehashing - 71.009 seconds</li> <li> Just comparing the strings (no hashing) 0.089 seconds</li> </ul> How could the string comparing be so much faster than the rolling hash? Could it just have something to do with JavaScript in particular? Strings are a primitive type in JavaScript; would this cause string comparisons to run in constant time? My main confusion is as to how/why string comparisons are so fast in JavaScript, when I was under the impression that they were supposed to be relatively slow. Note: By string comparisons I'm referring to something like <code>stringA === stringB</code> Note: I asked this question over on the Computer Science Community and was informed that I should ask it here as well because this is most likely JavaScript specific.

After some testing and analysis, I've come to the conclusion that there were a few reasons as to why my rolling hash approach was running slightly slower than simply comparing the two strings. <hr> <ul> <li> If the rolling hash claims to run in constant time, how can it be slower than comparing strings? Functions are relatively slow - calling a function is slightly slower than simply executing code inline. In my particular case, a function had to be called on my object every time the rolling hash rehashes its internal window, therefore taking slightly longer to run compared to the string comparison, since that code was simply inline. Especially since my benchmark has the rolling hash "shift" over 2 million iterations, this function slow down can be seen more clearly. </li> <li> But why is the string comparison so fast? Strings are primitive - Basically, because strings are a primitive type in JavaScript, the attempting to compare two strings will most likely invoke some routine that is coded directly within the interpreter. This low level evaluation can be done as fast as the architecture possibly can (similar to comparing numbers). </li> </ul> <hr> In Conclusion Comparing strings in JavaScript will end up being faster than a rolling hash in this scenario because the strings are primitive, therefore allowing the interpreter to work with these elements very quickly, and because simply calling functions will create a slight overhead and slow down the process on a very small scale.

String Comparison vs. Hashing

Tags:

javascript

hash

runtime

I recently learned about the rolling hash data structure, and basically one of its prime uses to searching for a substring within a string. Here are some advantages that I noticed:

Comparing two strings can be expensive so this should be avoided if possible
Hashing the strings and comparing the hashes is generally much faster than comparing strings, however rehashing the new substring each time traditionally takes linear time
A rolling hash is able to rehash the new substring in constant time, making it much quicker and more efficient for this task

I went ahead and implemented a rolling hash in JavaScript and began to analyze the speed between a rolling hash, traditional rehashing, and just comparing the substrings against each other.

In my findings, the larger the substring, the longer it took for the traditional rehashing approach to run (as expected) where the rolling hash ran incredibly fast (as expected). However, comparing the substrings together ran much faster than the rolling hash. How could this be?

For the sake of perspective, let's say the running times for the functions searching through a ~2.4 million character string for a 100 character substring were the following:

Rolling Hash - 0.809 seconds
Traditional Rehashing - 71.009 seconds
Just comparing the strings (no hashing) 0.089 seconds

How could the string comparing be so much faster than the rolling hash? Could it just have something to do with JavaScript in particular? Strings are a primitive type in JavaScript; would this cause string comparisons to run in constant time?

My main confusion is as to how/why string comparisons are so fast in JavaScript, when I was under the impression that they were supposed to be relatively slow.

Note: By string comparisons I'm referring to something like stringA === stringB

Note: I asked this question over on the Computer Science Community and was informed that I should ask it here as well because this is most likely JavaScript specific.

284

asked Jan 16 '16 18:01

Nick Zuber

1 Answers

After some testing and analysis, I've come to the conclusion that there were a few reasons as to why my rolling hash approach was running slightly slower than simply comparing the two strings.

If the rolling hash claims to run in constant time, how can it be slower than comparing strings?

Functions are relatively slow - calling a function is slightly slower than simply executing code inline. In my particular case, a function had to be called on my object every time the rolling hash rehashes its internal window, therefore taking slightly longer to run compared to the string comparison, since that code was simply inline. Especially since my benchmark has the rolling hash "shift" over 2 million iterations, this function slow down can be seen more clearly.
But why is the string comparison so fast?

Strings are primitive - Basically, because strings are a primitive type in JavaScript, the attempting to compare two strings will most likely invoke some routine that is coded directly within the interpreter. This low level evaluation can be done as fast as the architecture possibly can (similar to comparing numbers).

In Conclusion

Comparing strings in JavaScript will end up being faster than a rolling hash in this scenario because the strings are primitive, therefore allowing the interpreter to work with these elements very quickly, and because simply calling functions will create a slight overhead and slow down the process on a very small scale.

145

answered Oct 01 '22 18:10

Nick Zuber

Related questions
                            
                                What is the best approach for TypeScript with ES6 modules?
                            
                                Property on the prototype overrides the property of the actual object?
                            
                                Socket.io - Implementing a user-socket association map for private messaging
                            
                                Setting Keyboard Focus to YouTube Embed
                            
                                Writing a curried javascript function that can be called an arbitrary number of times that returns a value on the very last function call
                            
                                View component hierarchy in React Native
                            
                                DataTables get array data of cells of selected rows Only
                            
                                How can I attach an image to an email from an URL using nodemailer and request modules in node.js?
                            
                                Open Layers 3 get Google Maps baselayer?
                            
                                Clone a new object simply by assigning object to a variable using Immutable.js
                            
                                Canvas - fill area below or above lines
                            
                                Why are curly brackets needed when using jQuery to adjust width and height?
                            
                                Angular 2: multiple <router-outlet> for sub routes
                            
                                ngRoute not working while no errors are reported to console [closed]
                            
                                why splice not work properly in angular js
                            
                                Javascript / Node JS best way to create singleton object
                            
                                Is there a way to do a tcp connection to an IP with javascript?
                            
                                Change all links at once
                            
                                How to render linebreaks as <br> tags with Aurelia
                            
                                why is createServer() considered not a function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

String Comparison vs. Hashing

Tags:

javascript

hash

runtime

Nick Zuber

People also ask

1 Answers

Nick Zuber

Recent Activity

Donate For Us