Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

php - Is strpos the fastest way to search for a string in a large body of text?

Tags:

if (strpos(htmlentities($storage->getMessage($i)),'chocolate'))  

Hi, I'm using gmail oauth access to find specific text strings in email addresses. Is there a way to find text instances quicker and more efficiently than using strpos in the above code? Should I be using a hash technique?

like image 821
Bob Cavezza Avatar asked Oct 06 '10 15:10

Bob Cavezza


People also ask

What is strpos function in PHP?

The strpos() function finds the position of the first occurrence of a string inside another string. Note: The strpos() function is case-sensitive. Note: This function is binary-safe. Related functions: strrpos() - Finds the position of the last occurrence of a string inside another string (case-sensitive)

How will you search a string in PHP?

PHP's strstr() function simply takes a string to search, and a chunk of text to search for. If the text was found, it returns the portion of the string from the first character of the match up to the end of the string: $myString = 'Hello, there!'; echo strstr( $myString, 'llo' ); // Displays "llo, there!"

Which PHP function searches for a specific text within a string?

The PHP strpos() function searches for a specific text within a string. If a match is found, the function returns the character position of the first match.


1 Answers

According to the PHP manual, yes- strpos() is the quickest way to determine if one string contains another.

Note:

If you only want to determine if a particular needle occurs within haystack, use the faster and less memory intensive function strpos() instead.

This is quoted time and again in any php.net article about other string comparators (I pulled this one from strstr())

Although there are two changes that should be made to your statement.

if (strpos($storage->getMessage($i),'chocolate') !== FALSE) 

This is because if(0) evaluates to false (and therefore doesn't run), however strpos() can return 0 if the needle is at the very beginning (position 0) of the haystack. Also, removing htmlentities() will make your code run a lot faster. All that htmlentities() does is replace certain characters with their appropriate HTML equivalent. For instance, it replaces every & with &

As you can imagine, checking every character in a string individually and replacing many of them takes extra memory and processor power. Not only that, but it's unnecessary if you plan on just doing a text comparison. For instance, compare the following statements:

strpos('Billy & Sally', '&'); // 6 strpos('Billy & Sally', '&'); // 6 strpos('Billy & Sally', 'S'); // 8 strpos('Billy & Sally', 'S') // 12 

Or, in the worst case, you may even cause something true to evaluate to false.

strpos('<img src...', '<'); // 0 strpos('&lt;img src...','<'); // FALSE 

In order to circumvent this you'd end up using even more HTML entities.

strpos('&lt;img src...', '&lt;'); // 0 

But this, as you can imagine, is not only annoying to code but gets redundant. You're better off excluding HTML entities entirely. Usually HTML entities is only used when you're outputting text. Not comparing.

like image 160
stevendesu Avatar answered Nov 07 '22 23:11

stevendesu