Scenario:
There is a database of movies a user owns, movies are displayed on a page called "my-movies", the movies can be displayed in the order that the user desires. For example "Fight Club" in position #1, "Drive" in position #3 and so on and so forth.
The obvious solution is to store a position with each item, for example:
movieid, userid, position
1 | 1 | 1
2 | 1 | 2
3 | 1 | 3
Then when outputting the data is ordered by the position. This method works fine for output however it has a problem when updating: the position of an item all the other positions need to be updated because positions are relative. If movie #3 is now in position number 2 then movie #3 now needs to be updated to position #2. If the database contains 10,000 movies and a movie is moved from position #1 to position #9999 that's almost 10,000 rows to be updated!
My only solution is to store positioning separately, instead of having an individual field for each items position it's just one big data dump of positions that are taken in run time and associated with each item (json, xml, whatever) but that feels... inefficient because the database can't be left to do the sorting.
My summarised question: What's the most efficient way of storing items positions in a list that is friendly to fetching and updating?
Database tables The tables of a relational database have some important characteristics: There is no significance to the order of the columns or rows. Each row contains one and only one value for each column. Each value for a given column has the same type.
Either way, the answer is - it doesn't matter. Columns, you should have some sort of logic so that someone approaching your table blind can understand how things flow ideally, but they can reconstruct it however they want in the select statement. Rows, it truly doesn't matter at all.
The ORDER BY clause contains a column or combination of columns that are guaranteed to be unique. The simplest way to understand that a sort is not stable is to go back to the definition of a table. Tables are inherently unordered in SQL. So, there is no ordering to fall back on for "stability".
If you use a combination of the position and a timestamp that the user put a movie in a given position rather than trying to maintain the actual position, then you can achieve a fairly simple means of both SELECTing and UPDATEing the data. For example; a base set of data:
create table usermovies (userid int, movieid int, position int, positionsetdatetime datetime) insert into usermovies (userid, movieid, position, positionsetdatetime) values (123, 99, 1, getutcdate()) insert into usermovies (userid, movieid, position, positionsetdatetime) values (123, 98, 2, getutcdate()) insert into usermovies (userid, movieid, position, positionsetdatetime) values (123, 97, 3, getutcdate()) insert into usermovies (userid, movieid, position, positionsetdatetime) values (123, 96, 4, getutcdate()) insert into usermovies (userid, movieid, position, positionsetdatetime) values (123, 95, 5, getutcdate()) insert into usermovies (userid, movieid, position, positionsetdatetime) values (123, 94, 6, getutcdate()) insert into usermovies (userid, movieid, position, positionsetdatetime) values (987, 99, 1, getutcdate()) insert into usermovies (userid, movieid, position, positionsetdatetime) values (987, 98, 2, getutcdate()) insert into usermovies (userid, movieid, position, positionsetdatetime) values (987, 97, 3, getutcdate()) insert into usermovies (userid, movieid, position, positionsetdatetime) values (987, 96, 4, getutcdate()) insert into usermovies (userid, movieid, position, positionsetdatetime) values (987, 95, 5, getutcdate()) insert into usermovies (userid, movieid, position, positionsetdatetime) values (987, 94, 6, getutcdate())
If you query the user's movies using a query like this:
;with usermovieswithrank as ( select userid , movieid , dense_rank() over (partition by userid order by position asc, positionsetdatetime desc) as movierank from usermovies ) select * from usermovieswithrank where userid=123 order by userid, movierank asc
Then you'll get the expected result:
USERID MOVIEID MOVIERANK 123 99 1 123 98 2 123 97 3 123 96 4 123 95 5 123 94 6
To move one of the rankings of the movies we need to update the position and the positionsetdatetime columns. For example, if userid 123 moves movie 95 from rank 5 to rank 2 then we do this:
update usermovies set position=2, positionsetdatetime=getutcdate() where userid=123 and movieid=95
Which results in this (using the SELECT query above following the update):
USERID MOVIEID MOVIERANK 123 99 1 123 95 2 123 98 3 123 97 4 123 96 5 123 94 6
Then if userid 123 moves movie 96 to rank 1:
update usermovies set position=1, positionsetdatetime=getutcdate() where userid=123 and movieid=96
We get:
USERID MOVIEID MOVIERANK 123 96 1 123 99 2 123 95 3 123 98 4 123 97 5 123 94 6
Of course you'll end up with duplicate position column values within the usermovies table, but with this method you'll never show that column, you simply use it along with positionsetdatetime to determine a sorted rank for each user and the rank you determine is the real position.
If at some point you want the position column to properly reflect the movie rankings without reference to the positionsetdatetime you can use the movierank from the select query above to update the usermovies position column value, as it wouldn't actually affect the determined movie rankings.
I've been struggling with what best to do with this situation and have come to the realisation that BY FAR the best solution is a list/array of the movies in the order you want them eg;
userId, moviesOrder
1 : [4,3,9,1...]
obviously you will serialise your array.
'that feels... inefficient'?
consider the user had a list of 100 movies. Searching by position will be one database query, a string to array conversion and then moviesOrder[index]. Possibly slower than a straight DB lookup but still very very fast.
OTOH, consider if you change the order;
with a position stored in the db you need up to 100 row changes, compared to an array splice. The linked list idea is interesting but doesn't work as presented, would break everything if a single element failed, and looks a hell of a lot slower too. Other ideas like leaving gaps, use float are workable although a mess, and prone to failure at some point unless you GC.
It seems like there should be a better way to do it in SQL, but there really isn't.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With