Is there a notable difference in query performance, if the index is set on <code>datetime</code> type column, instead of <code>boolean</code> type column (and querying is done on that column)? In my current design I got 2 columns: <ul> <li> <code>is_active</code> TINYINT(1), indexed </li> <li> <code>deleted_at</code> DATETIME</li> </ul> query is <code>SELECT * FROM table WHERE is_active = 1;</code> Would it be any slower, if I made an index on <code>deleted_at</code> column instead, and ran queries like this <code>SELECT * FROM table WHERE deleted_at is null;</code> ?

Here is a MariaDB (10.0.19) benchmark with 10M rows (using the sequence plugin): <pre class="prettyprint"><code>drop table if exists test; CREATE TABLE `test` ( `id` MEDIUMINT UNSIGNED NOT NULL, `is_active` TINYINT UNSIGNED NOT NULL, `deleted_at` TIMESTAMP NULL, PRIMARY KEY (`id`), INDEX `is_active` (`is_active`), INDEX `deleted_at` (`deleted_at`) ) ENGINE=InnoDB select seq id , rand(1)<0.5 as is_active , case when rand(1)<0.5 then null else '2017-03-18' - interval floor(rand(2)*1000000) second end as deleted_at from seq_1_to_10000000; </code></pre> To measure the time I use <code>set profiling=1</code> and run <code>show profile</code> after executing a query. From the profiling result I take the value of <code>Sending data</code> since everything else is altogether less than one msec. TINYINT index: <pre class="prettyprint"><code>SELECT COUNT(*) FROM test WHERE is_active = 1; </code></pre> Runtime: ~ 738 msec TIMESTAMP index: <pre class="prettyprint"><code>SELECT COUNT(*) FROM test WHERE deleted_at is null; </code></pre> Runtime: ~ 748 msec Index size: <pre class="prettyprint"><code>select database_name, table_name, index_name, stat_value*@@innodb_page_size from mysql.innodb_index_stats where database_name = 'tmp' and table_name = 'test' and stat_name = 'size' </code></pre> Result: <pre class="prettyprint"><code>database_name | table_name | index_name | stat_value*@@innodb_page_size ----------------------------------------------------------------------- tmp | test | PRIMARY | 275513344 tmp | test | deleted_at | 170639360 tmp | test | is_active | 97107968 </code></pre> Note that while TIMESTAMP (4 Bytes) is 4 times as long as TYNYINT (1 Byte), the index size is not even twice as large. But the index size can be significant if it doesn't fit into memory. So when i change <code>innodb_buffer_pool_size</code> from <code>1G</code> to <code>50M</code> i get the following numbers: <ul> <li>TINYINT: ~ 960 msec </li> <li>TIMESTAMP: ~ 1500 msec </li> </ul> <h3>Update</h3> To address the question more directly I did some changes to the data: <ul> <li>Instead of TIMESTAMP I use DATETIME</li> <li>Since entries are usually rarely deleted I use <code>rand(1)<0.99</code> (1% deleted) instead of <code>rand(1)<0.5</code> (50% deleted)</li> <li>Table size changed from 10M to 1M rows.</li> <li> <code>SELECT COUNT(*)</code> changed to <code>SELECT *</code> </li> </ul> Index size: <pre class="prettyprint"><code>index_name | stat_value*@@innodb_page_size ------------------------------------------ PRIMARY | 25739264 deleted_at | 12075008 is_active | 11026432 </code></pre> Since 99% of <code>deleted_at</code> values are NULL there is no significant difference in index size, though a non empty DATETIME requires 8 Bytes (MariaDB). <pre class="prettyprint"><code>SELECT * FROM test WHERE is_active = 1; -- 782 msec SELECT * FROM test WHERE deleted_at is null; -- 829 msec </code></pre> Dropping both indexes both queries execute in about 350 msec. And dropping the <code>is_active</code> column the <code>deleted_at is null</code> query executes in 280 msec. Note that this is still not a realistic scenario. You will unlikely want to select 990K rows out of 1M and deliver it to the user. You will probably also have more columns (maybe including text) in the table. But it shows, that you probably don't need the <code>is_active</code> column (if it doesn't add additional information), and that any index is in best case useless for selecting non deleted entries. However an index can be usefull to select deleted rows: <pre class="prettyprint"><code>SELECT * FROM test WHERE is_active = 0; </code></pre> Executes in 10 msec with index and in 170 msec without index. <pre class="prettyprint"><code>SELECT * FROM test WHERE deleted_at is not null; </code></pre> Executes in 11 msec with index and in 167 msec without index. Dropping the <code>is_active</code> column it executes in 4 msec with index and in 150 msec without index. So if this scenario somehow fits your data the conclusion would be: Drop the <code>is_active</code> column and don't create an index on <code>deleted_at</code> column if you are rarely selecting deleted entries. Or adjust the benchmark to your needs and make your own conclusion.

Performance of query on indexed Boolean column vs Datetime column

1 Answers

Here is a MariaDB (10.0.19) benchmark with 10M rows (using the sequence plugin):

drop table if exists test;
CREATE TABLE `test` (
    `id` MEDIUMINT UNSIGNED NOT NULL,
    `is_active` TINYINT UNSIGNED NOT NULL,
    `deleted_at` TIMESTAMP NULL,
    PRIMARY KEY (`id`),
    INDEX `is_active` (`is_active`),
    INDEX `deleted_at` (`deleted_at`)
) ENGINE=InnoDB
    select seq id
        , rand(1)<0.5 as is_active
        , case when rand(1)<0.5 
            then null
            else '2017-03-18' - interval floor(rand(2)*1000000) second
        end as deleted_at
    from seq_1_to_10000000;

To measure the time I use set profiling=1 and run show profile after executing a query. From the profiling result I take the value of Sending data since everything else is altogether less than one msec.

TINYINT index:

SELECT COUNT(*) FROM test WHERE is_active = 1;

Runtime: ~ 738 msec

TIMESTAMP index:

SELECT COUNT(*) FROM test WHERE  deleted_at is null;

Runtime: ~ 748 msec

Index size:

select database_name, table_name, index_name, stat_value*@@innodb_page_size
from mysql.innodb_index_stats 
where database_name = 'tmp'
  and table_name = 'test'
  and stat_name = 'size'

Result:

database_name | table_name | index_name | stat_value*@@innodb_page_size
-----------------------------------------------------------------------
tmp           | test       | PRIMARY    | 275513344 
tmp           | test       | deleted_at | 170639360 
tmp           | test       | is_active  |  97107968

Note that while TIMESTAMP (4 Bytes) is 4 times as long as TYNYINT (1 Byte), the index size is not even twice as large. But the index size can be significant if it doesn't fit into memory. So when i change innodb_buffer_pool_size from 1G to 50M i get the following numbers:

TINYINT: ~ 960 msec
TIMESTAMP: ~ 1500 msec

Update

To address the question more directly I did some changes to the data:

Instead of TIMESTAMP I use DATETIME
Since entries are usually rarely deleted I use rand(1)<0.99 (1% deleted) instead of rand(1)<0.5 (50% deleted)
Table size changed from 10M to 1M rows.
SELECT COUNT(*) changed to SELECT *

Index size:

index_name | stat_value*@@innodb_page_size
------------------------------------------
PRIMARY    | 25739264
deleted_at | 12075008
is_active  | 11026432

Since 99% of deleted_at values are NULL there is no significant difference in index size, though a non empty DATETIME requires 8 Bytes (MariaDB).

SELECT * FROM test WHERE is_active = 1;      -- 782 msec
SELECT * FROM test WHERE deleted_at is null; -- 829 msec

Dropping both indexes both queries execute in about 350 msec. And dropping the is_active column the deleted_at is null query executes in 280 msec.

Note that this is still not a realistic scenario. You will unlikely want to select 990K rows out of 1M and deliver it to the user. You will probably also have more columns (maybe including text) in the table. But it shows, that you probably don't need the is_active column (if it doesn't add additional information), and that any index is in best case useless for selecting non deleted entries.

However an index can be usefull to select deleted rows:

SELECT * FROM test WHERE is_active = 0;

Executes in 10 msec with index and in 170 msec without index.

SELECT * FROM test WHERE deleted_at is not null;

Executes in 11 msec with index and in 167 msec without index.

Dropping the is_active column it executes in 4 msec with index and in 150 msec without index.

So if this scenario somehow fits your data the conclusion would be: Drop the is_active column and don't create an index on deleted_at column if you are rarely selecting deleted entries. Or adjust the benchmark to your needs and make your own conclusion.

106

answered Sep 20 '22 23:09

Paul Spiegel

Related questions
                            
                                SQL selecting spatial points
                            
                                Group by half hour interval
                            
                                Restricting MySQL 3306 port with IPTABLES
                            
                                How to display a simple QMap in a QTableView in Qt?
                            
                                Error: mysqladmin: refresh failed; error: 'Unknown error'
                            
                                Jhipster - MySQL java.sql.SQLException: Access denied for user 'root'@'localhost' (using password: NO)
                            
                                Yii2 viaTable join condition
                            
                                Completely Reseting Laravel 5 Migration Stuff?
                            
                                Writing Lengthy SQL queries in R
                            
                                Strange Python memory usage with Scapy
                            
                                Binding "not null" in PDO?
                            
                                PHP PDO query error on table has json data type (MySQL 5.7.8-rc)
                            
                                Laravel where if statement
                            
                                Group By Three columns doesn't work in CodeIgniter
                            
                                com.mysql.jdbc.exceptions.jdbc4.CommunicationsException MySQL + Apache Tomcat 7
                            
                                Laravel Eloquent pass variable to with relationship function
                            
                                Grant all privileges to all users on a host in mysql
                            
                                Codeigniter - select where id not in (another query result)
                            
                                ERROR: Error installing mysql2: ERROR: Failed to build gem native extension. on Mac 10.12
                            
                                How to send data from android to mysql server?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Performance of query on indexed Boolean column vs Datetime column

Tags:

performance

sql

indexing

mysql

mariadb

Alex

People also ask

1 Answers

Update

Paul Spiegel

Recent Activity

Donate For Us