Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sequential strpos() faster than a function with one preg_match?

i need to test if any of the strings 'hello', 'i am', 'dumb' exist in the longer string called $ohreally, if even one of them exists my test is over, and i have the knowledge that neither of the others will occur if one of them has.

Under these conditions I am asking for your help on the most efficient way to write this search,

strpos() 3 times like this?

if (strpos ($ohreally, 'hello')){return false;}  
   else if (strpos ($ohreally, 'i am')){return false;}  
   else if (strpos ($ohreally, 'dumb')){return false;}  
   else {return true;}

or one preg_match?

if (preg_match('hello'||'i am'||'dumb', $ohreally)) {return false}   
   else {return true};

I know the preg_match code is wrong, i would really appreciate if someone could offer the correct version of it.

Thank You!


Answer

Please read what cletus said and the test middaparka did bellow. I also did a mirco time test, on various strings, long and short. with these results

IF, you know the probability of the string values occurring ORDER them from most probable to least. (I did not notice a presentable different in ordering the regex itself i.e. between /hello|i am|dumb/ or /i am|dumb|hello/.

On the other hand in sequential strpos the probability makes all the difference. For example if 'hello' happens 90%, 'i am' 7% and 'dumb' 3 percent of the time. you would like to organize your code to check for 'hello' first and exit the function as soon as possible.

my microtime tests show this.

for haystacks A, B, and C in which the needle is found respectively on the first, second, and third strpos() execution, the times are as follows,

strpos:
A: 0.00450 seconds // 1 strpos()
B: 0.00911 seconds // 2 strpos()
C: 0.00833 seconds // 3 strpos()
C: 0.01180 seconds // 4 strpos() added one extra

and for preg_match:
A: 0.01919 seconds // 1 preg_match()
B: 0.02252 seconds // 1 preg_match()
C: 0.01060 seconds // 1 preg_match()

as the numbers show, strpos is faster up to the 4rth execution, so i will be using it instead since i have only 3, sub-stings to check for : )

like image 876
Mohammad Avatar asked Dec 18 '22 03:12

Mohammad


1 Answers

The correct syntax is:

preg_match('/hello|i am|dumb/', $ohreally);

I doubt there's much in it either way but it wouldn't surprise me if the strpos() method is faster depending on the number of strings you're searching for. The performance of strpos() will degrade as the number of search terms increases. The regex probably will to but not as fast.

Obviously regular expressions are more powerful. For example if you wanted to match the word "dumb" but not "dumber" then that's easily done with:

preg_match('/\b(hello|i am|dumb)\b/', $ohreally);

which is a lot harder to do with strpos().

Note: technically \b is a zero-width word boundary. "Zero-width" means it doesn't consume any part of the input string and word boundary means it matches the start of the string, the end of the string, a transition from word (digits, letters or underscore) characters to non-word characters or a transition from non-word to word characters. Very useful.

Edit: it's also worth noting that your usage of strpos() is incorrect (but lots of people make this same mistake). Namely:

if (strpos ($ohreally, 'hello')) {
  ...
}

will not enter the condition block if the needle is at position 0 in the string. The correct usage is:

if (strpos ($ohreally, 'hello') !== false) {
  ...
}

because of type juggling. Otherwise 0 is converted to false.

like image 138
cletus Avatar answered Feb 16 '23 00:02

cletus