Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

INT vs VARCHAR in search

Which one of the following queries will be faster and more optimal (and why):

  1. SELECT * FROM items WHERE w = 320 AND h = 200 (w and h are INT)

  2. SELECT * FROM items WHERE dimensions = '320x200'(dimensions is VARCHAR)

like image 588
Vincent Avatar asked Jan 05 '11 18:01

Vincent


People also ask

Should I use varchar or int?

1 Answer. Int comparisons are faster than varchar comparisons, and ints take much less space than varchars. This is applicable true for both unindexed and indexed access. You can use an indexed int column to make it faster.

What is the difference between varchar and int?

Integer is for numbers, and varchar is for numbers, letters and other characters (Short text). So for age you can use a int type, for genders you can use the enum() type if there are only two options. Varchar is text and integer is number.

Is varchar a string or int?

As the name suggests, varchar means character data that is varying. Also known as Variable Character, it is an indeterminate length string data type. It can hold numbers, letters and special characters.

Does varchar include int?

A VARCHAR declaration must include a positive integer in parentheses to define the maximum allowable character string length. For example, VARCHAR(n) can accept any length of character string up to n characters in length. The length parameter may take any value from 1 to the current table page size.


2 Answers

Here are some actual measurements. (Using SQLite; may try it with MySQL later.)

Data = All 1,000,000 combinations of w, h ∈ {1...1000}, in randomized order.

CREATE TABLE items (id INTEGER PRIMARY KEY, w INTEGER, h INTEGER)

Average time (of 20 runs) to execute SELECT * FROM items WHERE w = 320 and h = 200 was 5.39±0.29 µs.

CREATE TABLE items (id INTEGER PRIMARY KEY, dimensions TEXT)

Average time to execute SELECT * FROM items WHERE dimensions = '320x200' was 5.69±0.23 µs.

There is no significant difference, efficiency-wise.

But

There is a huge difference in terms of usability. For example, if you want to calculate the area and perimeter of the rectangles, the two-column approach is easy:

SELECT w * h, 2 * (w + h) FROM items

Try to write the corresponding query for the other way.

like image 99
dan04 Avatar answered Oct 11 '22 15:10

dan04


Intuitively, if you do not create INDEXes on those columns, integer comparison seems faster.

In integer comparison, you compare directly 32-bit values equality with logical operators.

On the other hand, strings are character arrays, it will be difficult to compare them. Character-by-character.

However, another point is that, in 2nd query you have 1 field to compare, in 1st query you have 2 fields. If you have 1,000,000 records and no indexes on columns, that means you may have 1,000,000 string comparisons on worst case (unluckily last result is the thing you've looking for or not found at all)

On the other hand you have 1,000,000 records and all are w=320, then you'll be comparing them for h,too. That means 2,000,000 comparisons. However you create INDEXes on those fields IMHO they will be almost identical since VARCHAR will be hashed (takes O(1) constant time) and will be compared using INT comparison and take O(logn) time.

Conclusion, it depends. Prefer indexes on searchable columns and use ints.

like image 38
ahmet alp balkan Avatar answered Oct 11 '22 16:10

ahmet alp balkan