Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Wildcard Search - Efficiency?

There has been a debate at work recently at the most efficient way to search a MS SQL database using LIKE and wildcards. We are comparing using %abc%, %abc, and abc%. One person has said that you should always have the wildcard at the end of the term (abc%). So, according to them, if we wanted to find something that ended in "abc" it'd be most efficient to use `reverse(column) LIKE reverse('%abc').

I set up a test using SQL Server 2008 (R2) to compare each of the following statements:

select * from CLMASTER where ADDRESS like '%STREET'
select * from CLMASTER where ADDRESS like '%STREET%'   
select * from CLMASTER where ADDRESS like reverse('TEERTS%')  
select * from CLMASTER where reverse(ADDRESS) like reverse('%STREET')

CLMASTER holds about 500,000 records, there are about 7,400 addresses that end "Street", and about 8,500 addresses that have "Street" in it, but not necessarily at the end. Each test run took 2 seconds and they all returned the same amount of rows except for %STREET%, which found an extra 900 or so results because it picked up addresses that had an apartment number on the end.

Since the SQL Server test didn't show any difference in execution time I moved into PHP where I used the following code, switching in each statement, to run multiple tests quickly:

<?php

    require_once("config.php");
    $connection = odbc_connect( $connection_string, $U, $P );

    for ($i = 0; $i < 500; $i++) {
    $m_time = explode(" ",microtime());
    $m_time = $m_time[0] + $m_time[1];

    $starttime = $m_time;

    $Message=odbc_exec($connection,"select * from CLMASTER where ADDRESS like '%STREET%'");
    $Message=odbc_result($Message,1);

    $m_time = explode(" ",microtime());
    $m_time = $m_time[0] + $m_time[1];

    $endtime = $m_time;

    $totaltime[] = ($endtime - $starttime);

}

odbc_close($connection);

echo "<b>Test took and average of:</b> ".round(array_sum($totaltime)/count($totaltime),8)." seconds per run.<br>";
echo "<b>Test took a total of:</b> ".round(array_sum($totaltime),8)." seconds to run.<br>";

?>

The results of this test was about as ambiguous as the results when testing in SQL Server.

%STREET completed in 166.5823 seconds (.3331 average per query), and averaged 500 results found in .0228.

%STREET% completed in 149.4500 seconds (.2989 average per query), and averaged 500 results found in .0177. (Faster time per result because it finds more results than the others, in similar time.)

reverse(ADDRESS) like reverse('%STREET') completed in 134.0115 seconds (.2680 average per query), and averaged 500 results found in .0183 seconds.

reverse('TREETS%') completed in 167.6960 seconds (.3354 average per query), and averaged 500 results found in .0229.

We expected this test to show that %STREET% would be the slowest overall, while it was actually the fastest to run, and had the best average time to return 500 results. While the suggested reverse('%STREET') was the fastest to run overall, but was a little slower in time to return 500 results.

Extra fun: A coworker ran profiler on the server while we were running the tests and found that the use of the double wildcard produced a significant increase CPU usage, while the other tests were within 1-2% of each other.

Are there any SQL Efficiency experts out that that can explain why having the wildcard at the end of the search string would be better practice than the beginning, and perhaps why searching with wildcards at the beginning and end of the string was faster than having the wildcard just at the beginning?

like image 500
WhoaItsAFactorial Avatar asked Aug 03 '12 12:08

WhoaItsAFactorial


People also ask

Do wildcards take longer to run in SQL?

SQL Server Wildcard Searches Using %The first query is faster because the WHERE condition is Sargable.

How does it make search faster in SQL?

Indexing makes columns faster to query by creating pointers to where data is stored within a database. Imagine you want to find a piece of information that is within a large database. To get this information out of the database the computer will look through every row until it finds it.

How do I make SQL Server search faster?

If you search the beginning of each word though with your search parameter then FTS can give you the speed you need. Example: @search with value Win should match address "123 Window lane" but not "123 LikWin".

Is SQL wildcard case sensitive?

SQL keywords are by default set to case insensitive, which means that the keywords are allowed to be used in lower or upper case. The names of the tables and columns specification are set to case insensitive on the SQL database server; however, it can be enabled and disabled by configuring the settings in SQL.


1 Answers

Having the wildcard at the end of the string, like 'abc%', would help if that column were indexed, as it would be able to seek directly to the records which start with 'abc' and ignore everything else. Having the wild card at the beginning means it has to look at every row, regardless of indexing.

Good article here with more explanation.

like image 50
Bort Avatar answered Sep 22 '22 09:09

Bort