Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Indexing VARCHAR in MySQL

Is performance of index on a VARCHAR field compared to INT field same or worse?

I have a table with two fields(Both are of VARCHAR). I want to create an index on 1st column. I have two things in my mind. 1. Directly create index on first field. 2. Assign an integer to each VARCHAR in the first field and store them in a table. Then replace the VARCHARs in the first column with integers and create another table and index on the first colum(In this case it will be integer column). I will end up with two different tables here.

Also if I have 4 or 5 tables with same 1st field and have to create index on the 1st field in every table, will the second method give better performance?

Thanks in advance.

like image 667
Avinash142857 Avatar asked Aug 30 '17 16:08

Avinash142857


People also ask

Can we CREATE INDEX on VARCHAR data type in MySQL?

You can index a VARCHAR column. I mean if you want to search the whole column, not substrings within it. An normal index won't help if you search for substrings with a LIKE predicate with a leading wildcard. VARCHAR data may be longer than other columns.

Can we apply index on VARCHAR data type?

1) You can also create indexes on multiple fields combining int and varchar types, taking special care for filtering always on first field defined in the index.

Is indexing possible in MySQL?

Indexing is a powerful structure in MySQL which can be leveraged to get the fastest response times from common queries. MySQL queries achieve efficiency by generating a smaller table, called an index, from a specified column or set of columns. These columns, called a key, can be used to enforce uniqueness.

Can you index a text column in MySQL?

MySQL has support for full-text indexing and searching: A full-text index in MySQL is an index of type FULLTEXT . Full-text indexes can be used only with InnoDB or MyISAM tables, and can be created only for CHAR , VARCHAR , or TEXT columns.


2 Answers

Indexes on VARCHAR() columns are indeed slightly less efficient than indexes on fixed-length fields like INT or BIGINT. But not significantly so.

The only conceivable situation where you would want to use a second table containing a numbered list of text strings is this: the number of distinct text strings in your application is much smaller than the number of rows in your tables. Why might that be true? For example, the text strings might be words in a so-called "controlled vocabulary." For example, music tracks have a genre like "rock", "classical", or "hiphop". It's pointless to allow arbitrary genres like "southern california alt-surf-rock" in such an application.

Don't overthink this. Keep in mind that database server developers have spent a great deal of time optimizing the performance of their indexes. It's almost impossible that you can do better than they have done, especially if you have to introduce extra tables and constraints to your system.

Put indexes on your VARCHAR() columns as needed.

(Another factor: collations get baked into indexes on VARCHAR() columns. If you build a custom indexing scheme like the one you propose, you have to deal with that complexity in your code. It's a notorious pain in the neck.)

Fun fact to know and tell: Systems in the olden days of computing (when all the cool kids had T1 lines) offered objects called "atoms." These were text strings referred to with id numbers. Atoms showed up in the X Window System (for example) in the xlib function call XInternAtom() and related functions. Why? partly to save memory and network bandwidth, which were scarcer then than now. Partly for the "controlled vocabulary" purpose mentioned earlier in this post.

like image 110
O. Jones Avatar answered Sep 21 '22 13:09

O. Jones


The difference in performance of an index on varchar vs. an index on int is small enough to be irrelevant, unless you get into large scale (e.g. on the order of hundreds of millions of rows).

My advice is to keep your design simple and straightforward. Just run searches on the indexed varchar.

If you reach a scale where you think this is a bottleneck, then consider redesigning at that time.

like image 33
Bill Karwin Avatar answered Sep 19 '22 13:09

Bill Karwin