I have a table with primary keys in string like 12a4...
, c3af...
. I want to process them in parallel:
process_them(1,4) on machine 1
process_them(2,4) on machine 2
process_them(3,4) on machine 3
process_them(4,4) on machine 4
Doing the above must select all rows in the table, without machines coordinating with each other. The best idea I can come up with is to split them into 16 like:
select * from table where id like '1%'
...
select * from table where id like 'e%'
select * from table where id like 'f%'
Is there a better idea that allows me more splits like 1/2, 1/4, 1/8, 1/16, 1/32 etc of the total rows?
Note: I am doing this to do nightly processing on user data and sending them notification. I am not editing anything on the DB itself. And we need to process thousands of users at a time, its cannot be split in a fine-grained manner as it wont be efficient that way.
Neat idea...
you can use an MD5 hash to distrubute the rows in a reasonable well distributed way quickly, consitently (There will never be a missed row) and without ddl changes.
*let n = number of desired partitions. Use the following sql to
*let s = salt, expirementally chosen to provide the best distribution based on key allocation pattern.
SELECT * FROM TABLE WHERE mod( cast( conv( md5( concat( s, Priamry_Key ) ), 16, 10), n ) = 0;
SELECT * FROM TABLE WHERE mod( cast( conv( md5( concat( s, Priamry_Key ) ), 16, 10), n ) = 1;
...
...
SELECT * FROM TABLE WHERE mod( cast( conv( md5( concat( s, Priamry_Key ) ), 16, 10), n ) = (n-1);
This is an approach I have seen implemented in production enviornments a few times with good results.
The SQL here isnt tested I make no gaurantee's on sytax.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With