I'm storing an ordered list of several million items in a MySQL database. Reasonably often, items need to be added or removed from the list; equally often, the position within the list of an item must be determined. I'd say the read/write ratio is about 50:50.
Starting with a linked-list model, I read [1] and the various models discussed there. For a strict linked-list, the adjacency list model would work just fine, but since the read/write ratio is more or less equal, I went for a divide and conquer approach using standard contiguous lists:
Divide the entire list into 'buckets' of approximate length (say ~10000), maintaining an index of bucket sizes and their relative position within the main list. Each item is assigned to a specific bucket and keeps track of its position within that bucket.
With this approach, an item's position is determined by summing the sizes of the buckets that preceed the item's bucket in the list, then adding the item's position within its own bucket. To insert/remove an item from the list, the 'shifting' of items that results is localized to the bucket into which an item is being added or removed; the size of that bucket must also be updated accordingly.
There is some denormalization in this approach (the bucket sizes), and it is not inherently thread-safe, even with transactions, because during a remove/insert the table of items must be queried to determine the bucket position of the item being modified, and then updated to perform the 'shift' on all the other items in that item's bucket. Unless these actions are atomic (via stored procedure maybe?) threads consistently deadlock.
Are there any more approriate approaches to keeping this kind of data in an RDBMS? The thread-safety issue is causing me a big headache and it feels like there ought to be a better way to solve this problem than forcing me into using stored procedures.
Many thanks, Matt.
[1] Database Structure for Tree Data Structure
The HTML ol tag is used for ordered list. We can use ordered list to represent items either in numerical order format or alphabetical order format, or any format where an order is emphasized. There can be different types of numbered list: Numeric Number (1, 2, 3)
If you need a linked list (not a hierarchy), you can just use the approach described in this article in my blog:
, with this simple query:
SELECT @r AS _parent,
@r := (
SELECT id
FROM t_list
WHERE parent = _parent
) AS id
FROM (
SELECT @r := 0
) vars,
t_list
Make sure your id
and parent
have UNIQUE
indexes defined for this to be efficient.
Replace @r := 0
with @r := @id_of_record_to_start_with
to start browsing from any given id
.
To find out the item's position, just reverse the query:
SELECT COUNT(*)
FROM (
SELECT @r AS _id,
@r := (
SELECT parent
FROM t_list
WHERE id = _id
) AS id
FROM (
SELECT @r := @item_id
) vars,
t_list
) q
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With