Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of string comparison vs int join in SQL

It's accepted that searching a table on an int column is faster than on a string column (say varchar).

However, if I have a Shirt table with a Color column, would it be more performant to create a Color table with the primary key on that table being the foreign key on the Shirt table? Would the join negate the performance advantage of having the value in the Color column on Shirt being an int instead of a string value such as "Green" when searching for green Shirts?

like image 204
RobertMGlynn Avatar asked Sep 14 '12 19:09

RobertMGlynn


People also ask

Which type of join is faster in SQL?

In case there are a large number of rows in the tables and there is an index to use, INNER JOIN is generally faster than OUTER JOIN.

Which is the most efficient join in SQL?

TLDR: The most efficient join is also the simplest join, 'Relational Algebra'. If you wish to find out more on all the methods of joins, read further. Relational algebra is the most common way of writing a query and also the most natural way to do so.

Does SQL join reduce performance?

Using outer join limits the database optimization options which typically results in slower SQL execution. DISTINCT and UNION should be used only if it is necessary. DISTINCT and UNION operators cause sorting, which slows down the SQL execution. Use UNION ALL instead of UNION, if possible, as it is much more efficient.

Which join has better performance?

There is not a "better" or a "worse" join type. They have different meaning and they must be used depending on it. In your case, you probably do not have employees with no work_log (no rows in that table), so LEFT JOIN and JOIN will be equivalent in results.


2 Answers

If I understand correctly, you are asking which of these two queries would be faster:

SELECT * FROM shirt where color = 'Green'

vs

SELECT shirt.* FROM shirt s INNER JOIN colors c 
       ON s.colorid = c.colorid 
       WHERE c.color = 'Green'

It depends a little bit on the database (well ... maybe a lot depending on if it optimizes correctly, which most if not all should), but the lookup in the color table should be negligible and then the remaining execution could use the integer lookup value and should be faster. The bulk of the processing ultimately would be equivalent to SELECT * from shirt WHERE colorid=N. However, I suspect that you would not notice a difference in speed unless the table was quite large. The decision should probably be based on which design makes the most sense (probably the normalized one).

like image 22
Mark Wilkins Avatar answered Oct 24 '22 17:10

Mark Wilkins


Beyond performance, creating a separate Color table makes your design better normalized. So, some day in the future, when someone decides that "Dark Blue" should now be called "Navy Blue", you update 1 row in your Color table vs. updating many rows in your Shirt table.

like image 122
Joe Stefanelli Avatar answered Oct 24 '22 19:10

Joe Stefanelli