Is there a notable difference in query performance, if the index is set on datetime
type column, instead of boolean
type column (and querying is done on that column)?
In my current design I got 2 columns:
is_active
TINYINT(1), indexed
deleted_at
DATETIMEquery is SELECT * FROM table WHERE is_active = 1;
Would it be any slower, if I made an index on deleted_at
column instead, and ran queries like this SELECT * FROM table WHERE deleted_at is null;
?
Indexing makes columns faster to query by creating pointers to where data is stored within a database. Imagine you want to find a piece of information that is within a large database. To get this information out of the database the computer will look through every row until it finds it.
Yes, indexes can hurt performance for SELECTs. It is important to understand how database engines operate. Data is stored on disk(s) in "pages". Indexes make it possible to access the specific page that has a specific value in one or more columns in the table.
The number of indexes on a table is the most dominant factor for insert performance. The more indexes a table has, the slower the execution becomes. The insert statement is the only operation that cannot directly benefit from indexing because it has no where clause. Adding a new row to a table involves several steps.
A useful SQL Server index enhances the query and system performance without impacting the other queries. On the other hand, if you create an index without any preparation or consideration, it might cause performance degradations, slow data retrieval and could consume more critical resources such as CPU, IO and memory.
Here is a MariaDB (10.0.19) benchmark with 10M rows (using the sequence plugin):
drop table if exists test;
CREATE TABLE `test` (
`id` MEDIUMINT UNSIGNED NOT NULL,
`is_active` TINYINT UNSIGNED NOT NULL,
`deleted_at` TIMESTAMP NULL,
PRIMARY KEY (`id`),
INDEX `is_active` (`is_active`),
INDEX `deleted_at` (`deleted_at`)
) ENGINE=InnoDB
select seq id
, rand(1)<0.5 as is_active
, case when rand(1)<0.5
then null
else '2017-03-18' - interval floor(rand(2)*1000000) second
end as deleted_at
from seq_1_to_10000000;
To measure the time I use set profiling=1
and run show profile
after executing a query. From the profiling result I take the value of Sending data
since everything else is altogether less than one msec.
TINYINT index:
SELECT COUNT(*) FROM test WHERE is_active = 1;
Runtime: ~ 738 msec
TIMESTAMP index:
SELECT COUNT(*) FROM test WHERE deleted_at is null;
Runtime: ~ 748 msec
Index size:
select database_name, table_name, index_name, stat_value*@@innodb_page_size
from mysql.innodb_index_stats
where database_name = 'tmp'
and table_name = 'test'
and stat_name = 'size'
Result:
database_name | table_name | index_name | stat_value*@@innodb_page_size
-----------------------------------------------------------------------
tmp | test | PRIMARY | 275513344
tmp | test | deleted_at | 170639360
tmp | test | is_active | 97107968
Note that while TIMESTAMP (4 Bytes) is 4 times as long as TYNYINT (1 Byte), the index size is not even twice as large. But the index size can be significant if it doesn't fit into memory. So when i change innodb_buffer_pool_size
from 1G
to 50M
i get the following numbers:
To address the question more directly I did some changes to the data:
rand(1)<0.99
(1% deleted) instead of rand(1)<0.5
(50% deleted)SELECT COUNT(*)
changed to SELECT *
Index size:
index_name | stat_value*@@innodb_page_size
------------------------------------------
PRIMARY | 25739264
deleted_at | 12075008
is_active | 11026432
Since 99% of deleted_at
values are NULL there is no significant difference in index size, though a non empty DATETIME requires 8 Bytes (MariaDB).
SELECT * FROM test WHERE is_active = 1; -- 782 msec
SELECT * FROM test WHERE deleted_at is null; -- 829 msec
Dropping both indexes both queries execute in about 350 msec. And dropping the is_active
column the deleted_at is null
query executes in 280 msec.
Note that this is still not a realistic scenario. You will unlikely want to select 990K rows out of 1M and deliver it to the user. You will probably also have more columns (maybe including text) in the table. But it shows, that you probably don't need the is_active
column (if it doesn't add additional information), and that any index is in best case useless for selecting non deleted entries.
However an index can be usefull to select deleted rows:
SELECT * FROM test WHERE is_active = 0;
Executes in 10 msec with index and in 170 msec without index.
SELECT * FROM test WHERE deleted_at is not null;
Executes in 11 msec with index and in 167 msec without index.
Dropping the is_active
column it executes in 4 msec with index and in 150 msec without index.
So if this scenario somehow fits your data the conclusion would be: Drop the is_active
column and don't create an index on deleted_at
column if you are rarely selecting deleted entries. Or adjust the benchmark to your needs and make your own conclusion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With