Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a hex index into n pieces

Tags:

mysql

I have a table with primary keys in string like 12a4..., c3af.... I want to process them in parallel:

process_them(1,4) on machine 1
process_them(2,4) on machine 2
process_them(3,4) on machine 3
process_them(4,4) on machine 4

Doing the above must select all rows in the table, without machines coordinating with each other. The best idea I can come up with is to split them into 16 like:

select * from table where id like '1%'
...
select * from table where id like 'e%'
select * from table where id like 'f%'

Is there a better idea that allows me more splits like 1/2, 1/4, 1/8, 1/16, 1/32 etc of the total rows?

Note: I am doing this to do nightly processing on user data and sending them notification. I am not editing anything on the DB itself. And we need to process thousands of users at a time, its cannot be split in a fine-grained manner as it wont be efficient that way.

like image 254
Jesvin Jose Avatar asked Aug 01 '13 05:08

Jesvin Jose


1 Answers

Neat idea...

you can use an MD5 hash to distrubute the rows in a reasonable well distributed way quickly, consitently (There will never be a missed row) and without ddl changes.

*let n = number of desired partitions. Use the following sql to 
*let s = salt, expirementally chosen to provide the best distribution based on key allocation pattern.
SELECT *  FROM TABLE WHERE mod( cast( conv( md5( concat( s, Priamry_Key ) ), 16, 10), n ) = 0; 
SELECT *  FROM TABLE WHERE mod( cast( conv( md5( concat( s, Priamry_Key ) ), 16, 10), n ) = 1; 
...
...
SELECT *  FROM TABLE WHERE mod( cast( conv( md5( concat( s, Priamry_Key ) ), 16, 10), n ) = (n-1);

This is an approach I have seen implemented in production enviornments a few times with good results.

The SQL here isnt tested I make no gaurantee's on sytax.

like image 69
gbtimmon Avatar answered Nov 09 '22 23:11

gbtimmon