Which comparison would take longer time?
a = helloworldhelloworldhelloworld
b = https://www.somerandomurls.com/directory/anotherdirectory/helloworld.html
if a != b:
doThis()
versus
a=one, b=two
if a != b:
doThis()
I often need to check this against my database which has thousands of rows. I am not looking for any specific programming language. I just want to know which comparison takes faster. As you see, the value of b
is longer string on the first example and shorter on the second example. So I wonder if that might make any difference on comparison.
Time for string comparison is O(n), n being the length of the string.
However depending on the test data, you can manually optimize the matching algorithm. I have mentioned a few.
Optimization 1:
Check the size of both the strings, if unequal, return false. As this will stop the further O(n) comparison, and save time. Generally string data structure stores the size in memory, rather than calculating it each time. This allows O(1) time access to the string size.
Practically this is a huge optimization. I will explain how, by calculating the amortized time complexity.
If your string data structure can have a string of max size x, then there can be a total of (x + 1) possible string sizes (0, 1, 2, ... , x).
There are (x + 1) choose 2 ways of selecting two strings = x * (x + 1) / 2
If you use optimization 1, then you would need to compare the whole length only when two strings are of equal length. There will be only x + 1 such cases. Number of operations done will be 0 + 1 + 2 + .... + x = x * (x + 1) / 2 .
Remaining (x + 1) * (x - 2) / 2 cases will be calculated in O(1) time.
Hence total computations = x * (x + 1) / 2 + (x + 1) * (x - 2) / 2 = (x + 1) * (x - 1) which is O(n^2). Since we are doing x * (x + 1) / 2 string comparisons, hence amortized time complexity per comparison is O(1).
Where as without any optimization, there will be
0 + 1 * (x) * 1 + 2 * (x - 1) * 2 + 3 * (x - 3) * 3 + .... + x/2 * x/2 * x/2 calculations. Which will be without any doubt more than O(n^3). And amortized time complexity will be more than O(n).
Optimization 2:
Since you database contains web links, it is possible that they belong to the same website, hence their first few characters will always be same. This will lead to redundant CPU time usage. Hence better to check from the end for this case, as relative links will differ only from the end.
Note Theoretically speaking, we are not developing an algorithm that will change the worst case time complexity, it is still O(n). We are just optimizing the algorithm.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With