We have an application with a table with 20+ columns that are all searchable. Building indexes for all these columns would make write queries very slow; and any really useful index would often have to be across multiple columns increasing the number of indexes needed.
However, for 95% of these searches, only a small subset of those rows need to be searched upon, and quite a small number - say 50,000 rows.
So, we have considered using mySQL Partition tables - having a column that is basically isActive
which is what we divide the two partitions by. Most search queries would be run with isActive=1
. Most queries would then be run against the small 50,000 row partition and be quick without other indexes.
Only issue is the rows where isActive=1
is not fixed; i.e. it's not based on the date of the row or anything fixed like that; we will need to update isActive
based on use of the data in that row. As I understand it that is no problem though; the data would just be moved from one partition to another during the UPDATE query.
We do have a PK
on id
for the row though; and I am not sure if this is a problem; the manual seemed to suggest the partition had to be based on any primary keys. This would be a huge problem for us because the primary key ID has no basis on whether the row isActive
.
A table can contain a maximum of 1017 columns. Virtual generated columns are included in this limit. A table can contain a maximum of 64 secondary indexes. The index key prefix length limit is 3072 bytes for InnoDB tables that use DYNAMIC or COMPRESSED row format.
The maximum number of partitions for one table is 16000. A table may be re-partitioned as often as required. The limit of 16000 partitions is independent from the location of the partitions in a distributed (scale-out) landscape. Column names containing spaces are not supported for partitioning.
MySQL allows you to create a composite index that consists of up to 16 columns. A composite index is also known as a multiple-column index. The query optimizer uses the composite indexes for queries that test all columns in the index, or queries that test the first columns, the first two columns, and so on.
I am not a MySQL expert. My focus is Oracle, but I've been working with Partitioning for years and I've come to find that your suggested use is very appropriate but not inside the mainstream understanding of partitions.
Putting aside Index Merging for now. Let's say that your active rows are somewhat scattered and are a 1:20 ratio with the number of inactive rows. Say your page size is 8Kb and your get about 20 rows per block. If you get a very even distribution of isactive records, you'll have almost 1 per block. A full table scan will be much, much, much faster to read EVERY block/page in the table than using an index to find those same rows.
So let's say they are concentrated instead of evenly scattered. Even if they are concentrated in 20% of the pages or even 10% of the pages, a full table scan can out perform an index even in those cases.
So now include index merging. If after you scan the index of ISactive and you DO NOT visit the table but join those results to the results of ANOTHER index and that final result set will yield reading, say, less than 5% of your blocks. Then yes, and index on isactive and index merging could be a solution.
The caveat here is that there are a lot of limitation on the implementation of index joins in MySQL. Make sure that this works in your situation. But you said you have another 20 fields that may be searched. So if you don't index all of them so there's an available second index to join the IsActive index to, you'll not be using the index merging/join.
now if you partition on that column, you'll have 5% of the blocks with IsActive = True in them and they will be densely packed. A full partition scan will quickly yield the list of active records, and allow every other predicate to be applied as a filter instead of an index seek.
But that flag changes, right.
In Oracle we have a command that allows us to enable Row Migration. That means, when Is_Active changes from True to False, move the partition the row falls in. This is pretty expensive but only a bit more than the index maintenance that would occur if you indexed that column instead of partitioning by it. In a partitioned example. Oracle first changes the row with an update, then does a delete and then an insert. If you indexed that column, you'd do an update of the row and then the index entry for TRUE would be deleted and then an index entry for False would be create.
If MySQL doesn't have row migration then you'll have to program your crud package to do that. UPDATE_ROW_ISACTIVE(pk IN number) procedure <---- something like that) will do the delete and insert for you.
Regarding Konerak's Answer
While I agree that parallel access is ONE use of partitioning, it's not the exclusive one. But if you follow the link he provides, the user comment at the very bottom of the page is:
Beware of having low selectivity indexes on your table. A complex AND/OR WHERE clause will surely make your query very very slow if Index_Merge optimization is being used with an intersect() algorithm.
That seems to speak to your situation, so you can take that comment FWIW.
If you are going to index that many "column" you may want to rethink your data structure. For example, make each column a row/record instead. Then have a "group ID" to link the individual records together, and a "name" field to indicate what piece of data it is. Then you only need 1 index for all your pieces of data.
This name/value pair setup is actually fairly common now and is what some noSQL databases are based on. Which is something else you may want to look into. Something like MongoDB is excellent for indexing "all" pieces of data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With