I have a very large 500 million rows table with the following columns:
id
- Bigint
- Autoincrementing primary index. date
- Datetime
- Approximately 1.5 million rows per date, data older 1 year is deleted. uid
- VARCHAR(60)
- A user IDsessionNumber
- INT
start
- INT
- epoch of start time. end
- INT
- epoch of end time. The combination of uid
and sessionNumber
forms a uinque index. I also have an index on date.
Due to the sheer size, I'd like to partition the table.
Most of my accesses would be by date, so partitioning by date ranges seems intuitive, but as the date is not part of the unique index, this is not an option.
Option 1: RANGE PARTITION
on Date and BEFORE INSERT TRIGGER
I don't really have a regular issue with the uid
and sessionNumber
uniqueness being violated. The source data is consistent, but sessions that span two days may be inserted on two consecutive days with midnight being the end time of the first and start time of the second.
I'm trying to understand if I could remove the unique key and instead use a trigger that would
However, I am not sure if I can 1) trigger an update on the same table. or 2) prevent the actual insert.
Option 2: LINEAR HASH PARTITION
on UID
My second option is to use a linear hash partition on the UID. However I cannot see any example that utilizes a VARCHAR and converts it to an INTEGER which is used for the HASH
partitioning.
However I cannot finde a permitted way to convert from VARCHAR to INTEGER. For example
ALTER TABLE mytable
PARTITION BY HASH (CAST(md5(uid) AS UNSIGNED integer))
PARTITIONS 20
returns that the partition function is not allowed.
HASH partitioning must work with a 32-bit integer. But you can't convert an MD5 string to an integer simply with CAST()
.
Instead of MD5, CRC32()
can take an arbitrary string and converts to a 32-bit integer. But this is also not a valid function for partitioning.
mysql> alter table v partition by hash(crc32(uid));
ERROR 1564 (HY000): This partition function is not allowed
You could partition by the string using KEY Partitioning instead of HASH partitioning. KEY Partitioning accepts strings. It passes whatever input string through MySQL's built-in PASSWORD() function, which is basically related to SHA1.
However, this leads to another problem with your partitioning strategy:
mysql> alter table v partition by key(uid);
ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function
Your table's primary key id
does not include the column uid
that you want to partition by. This is a restriction of MySQL's partitioning:
every unique key on the table must use every column in the table's partitioning expression.
Here's the table I'm testing with (it would have been a good idea for you to include this in your question):
CREATE TABLE `v` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
`uid` varchar(60) NOT NULL,
`sessionNumber` int(11) NOT NULL,
`start` int(11) NOT NULL,
`end` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `uid` (`uid`,`sessionNumber`),
KEY `date` (`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Before going any further, I have to wonder why you want to use partitioning anyway? "Sheer size" is not a reason to partition a table.
Partitioning, like any optimization, is done for the sake of specific queries you want to optimize for. Any optimization improves one query at the expense of other queries. Optimization has nothing to do with the table. The table is happy to sit there with 5 billion rows, and it doesn't care. Optimization is for the queries.
So you need to know which queries you want to optimize for. Then decide on a strategy. Partitioning might not be the best strategy for the set of queries you need to optimize!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With