I need to be able to store a large list of ordered items in the DB. So far that's straight-forward:
ID Position OtherFields 1 45 ... 2 4736 ... 3 514 ... ...
In queries, I always need to get just a few items (filtered based on OtherFields) but in the correct order. Easy as well, putting an index on Position and using "order by Position".
Now the problem: Items change their Position frequently, and not just by 1 or 2. If ID 2 changes the Position from 4736 to 2000, I need to update its Position and the Position of all elements between old Position 2000 and 4735, adding 1 in each row. And it's not only one ID that changes per transaction but a few, and there can be many transactions within a short time.
I think the most elegant way to handle the update problem would be using a linked list instead of a Position column where I can just remove ID 2 from its old position by linking its predecessor to its successor and then insert it elsewhere by linking it between its new predecessor and successor. That would be a constant and small number of updates per Position change and it would also be my preferred way of handling the changes (in Java in my case). However this raises the N+1 problem for querying in the correct order - even for a few elements, I have to go through the whole list in the worst case for finding out their correct order.
So my question is: What would you recommend to get a good balance between necessary updates and query performance?
So far I see two promising directions:
Is there a DBMS (ideally OpenSource) that can handle linked lists not only with syntactic sugar but also with a good performance, e.g. by using internal Indices for the linked elements?
Maybe it would also be an option to just have a BLOB where the whole Linked List would be stored in! How big could such a Linked List get / how much memory would it use in the DB and when fetched for let's say 1.000.000 entries? I'm using Java + Hibernate in case it matters. I imagine that processing even the whole list in memory after fetching the BLOB should be pretty fast!?
But of course other ideas are welcome as well!
If you relax the constraint that the Position
column must contain integers from 1 to N and instead allow it to contain any numbers then you can do both searches and updates efficiently.
You can insert an item between two other items with position A and B by calculating the average (A + B) DIV 2. For example if A is 10000 and B is 12000 then your new position is 11000. Occasionally you will run out of gaps due to clustering, at which point you can run through the whole table redistributing the positions more evenly.
What about using a decimal for position? If you do, you could use the following method to put it between other positions:
Original records are:
ID Position Otherfields -------------------------- 1 1.0 2 2.0 . . . 5000 5000.0
Then say you move ID 1 to just before 5000
ID Position Otherfields -------------------------- 1 4999.9 2 2.0 . . . 5000 5000.0
Now lets say you want to put ID 2 between 1 and 5000:
ID Position Otherfields -------------------------- 1 4999.9 2 4999.91 . . . 5000 5000.0
This way you are only changing one record...
UPDATE:
After re-reading @Mark Byers' suggestion it appears that our solutions are very similar, though using a decimal seems much simpler to me...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With