Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL Performance - string vs integer

I have realized a very strange behavior of MySQL that I have no explanation for.

This is a not overly complex query:

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = 2 
    OR enr.Enricher4State = 2 
    OR enr.Enricher5State = 2 
    OR enr.Enricher9State = 2
  );

The columns Enricher3State, Enricher4State, Enricher5State, Enricher9State do have an index and are of the datatype int(11).

Now I have tried to change these Enricher[x]State to a string:

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = '2' 
    OR enr.Enricher4State = '2' 
    OR enr.Enricher5State = '2' 
    OR enr.Enricher9State = '2'
  ); 

Every common sense would say that the string variant should perform the same, or slower, because the datatype of the column is integer!

But apparently this is not the case!

Query with integer notation (first one): 7.23048825s

Query with string notation (last one): 5.22188450s

As you can see, there is a huge performance difference, even though the query cost is the same in both cases.

I have absolutely no clue how this difference could happen - and if this means, I should change all queries in my project using the string notation...

I am using MySQL version 5.7.10


According to your comments I have deactivated all services that write or read to the database and repeated the experiment.

A) The integer notation:

SET profiling=0;
SET profiling=1;

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = 2 
    OR enr.Enricher4State = 2 
    OR enr.Enricher5State = 2 
    OR enr.Enricher9State = 2
  );

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = 2 
    OR enr.Enricher4State = 2 
    OR enr.Enricher5State = 2 
    OR enr.Enricher9State = 2
  );

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = 2 
    OR enr.Enricher4State = 2 
    OR enr.Enricher5State = 2 
    OR enr.Enricher9State = 2
  );

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = 2 
    OR enr.Enricher4State = 2 
    OR enr.Enricher5State = 2 
    OR enr.Enricher9State = 2
  );

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = 2 
    OR enr.Enricher4State = 2 
    OR enr.Enricher5State = 2 
    OR enr.Enricher9State = 2
  );

  SHOW PROFILES;

Execution time of each query:

  • 6.42429325
  • 5.95059900
  • 6.34392825
  • 6.53041775
  • 6.69593450

B) The string notation:

SET profiling=0;
SET profiling=1;

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = '2' 
    OR enr.Enricher4State = '2' 
    OR enr.Enricher5State = '2' 
    OR enr.Enricher9State = '2'
  );

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = '2' 
    OR enr.Enricher4State = '2' 
    OR enr.Enricher5State = '2' 
    OR enr.Enricher9State = '2'
  );

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = '2' 
    OR enr.Enricher4State = '2' 
    OR enr.Enricher5State = '2' 
    OR enr.Enricher9State = '2'
  );

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = '2' 
    OR enr.Enricher4State = '2' 
    OR enr.Enricher5State = '2' 
    OR enr.Enricher9State = '2'
  );

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = '2' 
    OR enr.Enricher4State = '2' 
    OR enr.Enricher5State = '2' 
    OR enr.Enricher9State = '2'
  );

  SHOW PROFILES;

Execution time:

  • 5.07188875
  • 4.90356250
  • 4.86164300
  • 4.48403375
  • 5.06533725

As you can clearly see, the string notation is still faster!

The same behavior was also detected by other developers of my team, so I could exclude temporary stupidity of myself…

like image 898
Andreas Avatar asked Oct 19 '16 13:10

Andreas


2 Answers

Since the fields are indexed and you have OR condition and query does have integer constant as the condition, MySQL may spend time on cross index join calculation and then do table scan and with string constant MySQL does not do index considerations and just does table scan.

It is the case when having indices on many fields that are used in OR condition is not advantage but extra work for MySQL.

The OR condition does not grant required indexing of participating fields, quite often having indices on the "1,2,3,4" fields is bad for the table. These fields should be taken out to a separate table.

Added: Run EXPLAIN and if you see indices "1,2,3,4" fields listed for considered keys, that is what MySQL is spending the time on.

like image 101
Sergiy Tytarenko Avatar answered Sep 17 '22 21:09

Sergiy Tytarenko


Considering the answer from Sergiy Tytarenko, I have deleted the indices on the Enricher[x]State columns.

Execution time for integer notation:

  • 4.93739900
  • 5.01461550
  • 5.05932075
  • 5.02891175
  • 5.02525075

Execution time for string notation:

  • 5.04365650
  • 5.07545950
  • 5.12358825
  • 5.14665200
  • 5.15426525

The execution time is about the same now.

So indeed, when having multiple indices on columns connected with OR, care should be taken.

It seems I have accidentally discovered a nice workaround (except deleting the index), by making a string out of an integer...

like image 21
Andreas Avatar answered Sep 18 '22 21:09

Andreas