Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL Partitioning a VARCHAR(60)

I have a very large 500 million rows table with the following columns:

  • id - Bigint - Autoincrementing primary index.
  • date - Datetime - Approximately 1.5 million rows per date, data older 1 year is deleted.
  • uid - VARCHAR(60) - A user ID
  • sessionNumber - INT
  • start - INT - epoch of start time.
  • end - INT - epoch of end time.
  • More columns not relevant for this query.

The combination of uid and sessionNumber forms a uinque index. I also have an index on date.

Due to the sheer size, I'd like to partition the table.

Most of my accesses would be by date, so partitioning by date ranges seems intuitive, but as the date is not part of the unique index, this is not an option.

Option 1: RANGE PARTITION on Date and BEFORE INSERT TRIGGER

I don't really have a regular issue with the uid and sessionNumber uniqueness being violated. The source data is consistent, but sessions that span two days may be inserted on two consecutive days with midnight being the end time of the first and start time of the second.

I'm trying to understand if I could remove the unique key and instead use a trigger that would

  • Check if there is a session with the same identifiers the previous day and if so,
  • Updates the end date.
  • cancels the actual insert.

However, I am not sure if I can 1) trigger an update on the same table. or 2) prevent the actual insert.

Option 2: LINEAR HASH PARTITION on UID

My second option is to use a linear hash partition on the UID. However I cannot see any example that utilizes a VARCHAR and converts it to an INTEGER which is used for the HASH partitioning.

However I cannot finde a permitted way to convert from VARCHAR to INTEGER. For example

ALTER TABLE mytable
PARTITION BY HASH (CAST(md5(uid) AS  UNSIGNED integer)) 
PARTITIONS 20

returns that the partition function is not allowed.

like image 200
Hans Avatar asked Nov 21 '17 22:11

Hans


1 Answers

HASH partitioning must work with a 32-bit integer. But you can't convert an MD5 string to an integer simply with CAST().

Instead of MD5, CRC32() can take an arbitrary string and converts to a 32-bit integer. But this is also not a valid function for partitioning.

mysql> alter table v partition by hash(crc32(uid));
ERROR 1564 (HY000): This partition function is not allowed

You could partition by the string using KEY Partitioning instead of HASH partitioning. KEY Partitioning accepts strings. It passes whatever input string through MySQL's built-in PASSWORD() function, which is basically related to SHA1.

However, this leads to another problem with your partitioning strategy:

mysql> alter table v partition by key(uid);
ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function

Your table's primary key id does not include the column uid that you want to partition by. This is a restriction of MySQL's partitioning:

every unique key on the table must use every column in the table's partitioning expression.

Here's the table I'm testing with (it would have been a good idea for you to include this in your question):

CREATE TABLE `v` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `date` datetime NOT NULL,
  `uid` varchar(60) NOT NULL,
  `sessionNumber` int(11) NOT NULL,
  `start` int(11) NOT NULL,
  `end` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uid` (`uid`,`sessionNumber`),
  KEY `date` (`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Before going any further, I have to wonder why you want to use partitioning anyway? "Sheer size" is not a reason to partition a table.

Partitioning, like any optimization, is done for the sake of specific queries you want to optimize for. Any optimization improves one query at the expense of other queries. Optimization has nothing to do with the table. The table is happy to sit there with 5 billion rows, and it doesn't care. Optimization is for the queries.

So you need to know which queries you want to optimize for. Then decide on a strategy. Partitioning might not be the best strategy for the set of queries you need to optimize!

like image 166
Bill Karwin Avatar answered Oct 03 '22 03:10

Bill Karwin