Too many columns to index - use mySQL Partitions?

Tags:

We have an application with a table with 20+ columns that are all searchable. Building indexes for all these columns would make write queries very slow; and any really useful index would often have to be across multiple columns increasing the number of indexes needed.

However, for 95% of these searches, only a small subset of those rows need to be searched upon, and quite a small number - say 50,000 rows.

So, we have considered using mySQL Partition tables - having a column that is basically isActive which is what we divide the two partitions by. Most search queries would be run with isActive=1. Most queries would then be run against the small 50,000 row partition and be quick without other indexes.

Only issue is the rows where isActive=1 is not fixed; i.e. it's not based on the date of the row or anything fixed like that; we will need to update isActive based on use of the data in that row. As I understand it that is no problem though; the data would just be moved from one partition to another during the UPDATE query.

We do have a PK on id for the row though; and I am not sure if this is a problem; the manual seemed to suggest the partition had to be based on any primary keys. This would be a huge problem for us because the primary key ID has no basis on whether the row isActive.

976

asked Dec 13 '10 13:12

Christopher Padfield

2 Answers

I am not a MySQL expert. My focus is Oracle, but I've been working with Partitioning for years and I've come to find that your suggested use is very appropriate but not inside the mainstream understanding of partitions.

Index on low cardinality columns

Putting aside Index Merging for now. Let's say that your active rows are somewhat scattered and are a 1:20 ratio with the number of inactive rows. Say your page size is 8Kb and your get about 20 rows per block. If you get a very even distribution of isactive records, you'll have almost 1 per block. A full table scan will be much, much, much faster to read EVERY block/page in the table than using an index to find those same rows.

So let's say they are concentrated instead of evenly scattered. Even if they are concentrated in 20% of the pages or even 10% of the pages, a full table scan can out perform an index even in those cases.

So now include index merging. If after you scan the index of ISactive and you DO NOT visit the table but join those results to the results of ANOTHER index and that final result set will yield reading, say, less than 5% of your blocks. Then yes, and index on isactive and index merging could be a solution.

The caveat here is that there are a lot of limitation on the implementation of index joins in MySQL. Make sure that this works in your situation. But you said you have another 20 fields that may be searched. So if you don't index all of them so there's an available second index to join the IsActive index to, you'll not be using the index merging/join.

Partitioning a low cardinality column

now if you partition on that column, you'll have 5% of the blocks with IsActive = True in them and they will be densely packed. A full partition scan will quickly yield the list of active records, and allow every other predicate to be applied as a filter instead of an index seek.

But that flag changes, right.

In Oracle we have a command that allows us to enable Row Migration. That means, when Is_Active changes from True to False, move the partition the row falls in. This is pretty expensive but only a bit more than the index maintenance that would occur if you indexed that column instead of partitioning by it. In a partitioned example. Oracle first changes the row with an update, then does a delete and then an insert. If you indexed that column, you'd do an update of the row and then the index entry for TRUE would be deleted and then an index entry for False would be create.

If MySQL doesn't have row migration then you'll have to program your crud package to do that. UPDATE_ROW_ISACTIVE(pk IN number) procedure <---- something like that) will do the delete and insert for you.

Regarding Konerak's Answer

While I agree that parallel access is ONE use of partitioning, it's not the exclusive one. But if you follow the link he provides, the user comment at the very bottom of the page is:

Beware of having low selectivity indexes on your table. A complex AND/OR WHERE clause will surely make your query very very slow if Index_Merge optimization is being used with an intersect() algorithm.

That seems to speak to your situation, so you can take that comment FWIW.

answered Oct 05 '22 02:10

Stephanie Page

If you are going to index that many "column" you may want to rethink your data structure. For example, make each column a row/record instead. Then have a "group ID" to link the individual records together, and a "name" field to indicate what piece of data it is. Then you only need 1 index for all your pieces of data.

This name/value pair setup is actually fairly common now and is what some noSQL databases are based on. Which is something else you may want to look into. Something like MongoDB is excellent for indexing "all" pieces of data.

answered Oct 05 '22 01:10

Brent Baisley

Related questions
                            
                                SQL database Structure
                            
                                Normalizing this database: what would be ideal in this scenario?
                            
                                how to represent trees and their content in MySQL?
                            
                                Address book database design: denormalize?
                            
                                Does it make sense to use neo4j to index a file system
                            
                                PHP & MySQL - Best way to handle different database language contents
                            
                                database design - when to split tables?
                            
                                How to create multiple sequences in one table?
                            
                                Incorporate additional requirements into a legacy database design
                            
                                How to deal with mutually dependent inserts
                            
                                Rails 4 How to model a form with a collection of checkboxes with other text_field
                            
                                Database design: RBAC or ABAC?
                            
                                NestJs Design Problem: How can I avoid creating a Nodejs Instance for each team?
                            
                                Database schema for hierarchical groups
                            
                                what's the best implemention of client creatable and modifiable web forms in a relational database?
                            
                                database design with many type of users
                            
                                How to plan for schema changes in an SQLite database?
                            
                                A format for storing personal contacts in a database
                            
                                Database: when to split into separate tables?
                            
                                SQL Server Calculated Column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Too many columns to index - use mySQL Partitions?

Tags:

mysql

database-design

database-partitioning

partitioning