Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL ENUM VS INT

Tags:

php

mysql

I have few tables that have columns that can either be ENUM type or INT type. I tend to always use integer type assuming that it will be faster to perform search based on it.

For example one of my table has a column: StatusType which can have only 4 possible values: Completed, In Progress, Failed, Todo.

Instead of storing above as ENUM strings I store them as:

1, 2, 3, 4 respectively. And then in my PHP code I have constant variables that define these values like this:

define('COMPLETED', 1);
define('IN_PROGRESS', 2);
define('FAILED', 3);
define('TODO', 4);

Now my question is, am I doing it right way or I should just change it to be ENUM type and use strings to compare in queries? I have many other columns that can only have set of max 4-5 possible values.

like image 311
GGio Avatar asked Oct 22 '14 14:10

GGio


2 Answers

Enum values look really cool in MySQL, yet I am not a fan of them. They are limited to 255 values, so if you decide to add more values, then you might run into a limit. Also, as you describe, you need to synchronize the values in your application code with the values in the database -- something that seems potentially dangerous.

In addition, they make certain future changes more difficult. For instance, other databases do not support enums. And, if you want to add multi-lingual support, having codes embedded in data type definitions in the database is a bit hard to deal with.

The more standard method is one or more reference tables, where you use join to get the values. You can use a hybrid approach where you use a reference table in the database. Then you can load the reference table into the application to get the mapping from numbers to strings so you can avoid the joins in your code.

like image 55
Gordon Linoff Avatar answered Sep 28 '22 09:09

Gordon Linoff


You are half-correct. Enum is very bad from a performance perspective: MySQL Enum performance advantage?

That said, binding the definitions of the INTs to your code is also not a great thing. Ideally, if you were to follow the correct Data Normalization patterns, you would define the values of the INTs in the Database as well, in another table, and use the index of the definition as the value for the assignment.

See: http://en.wikipedia.org/wiki/Database_normalization#Normal_forms

The reason for this is so the data is portable, and useful without requiring the Codebase to read it (you can easily dump a CSV for Excel by executing a join).

God Speed.

Example SQL:

SELECT *, state.name AS state FROM students
JOIN states ON student.state_id = states.id

Just to get state names.

Or to filter:

SELECT * FROM students
JOIN states ON student.state_id = states.id
WHERE state.name = 'Maine' OR state.code = 'ME'

Yeah, strange example, but the idea is that INTs are TINY, and VARCHAR are... variable... Storing 'Maine' as opposed to '16' adds up over millions of rows. Further, the indexing on INT is MUCH faster than VARCHAR, so your look-ups are going to be much faster. Particularly if you inherently know the number ahead of time and build your query without the JOIN. This is not advisable as a common practice, but could be done if you wanted to make something even faster and you can ensure the validity of the assumed value.

like image 39
Mike Avatar answered Sep 28 '22 08:09

Mike