Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a good, CRUD-sympathetic algorithm for ordering list items?

Tags:

algorithm

I would like a simple way to represent the order of a list of objects. When an object changes position in that list I would like to update just one record. I don't know if this can be done but I'm interested to ask the SO hive...

Wish-list constraints

  • the algorithm (or data structure) should allow for items to be repositioned in the list by updating the properties of a single item
  • the algorithm (or data structure) should require no housekeeping to maintain the integrity of the list
  • the algorithm (or data structure) should allow for the insertion of new items or the removal of existing items

Why I care about only updating one item at a time...

[UPDATED to clarify question]

The use-case for this algorithm is a web application with a CRUDy, resourceful server setup and a clean (Angular) client.

It's good practice to keep to the pure CRUD actions where possible and makes for cleaner code all round. If I can do this operation in a single resource#update request then I don't need any additional serverside code to handle the re-ordering and it can all be done using CRUD with no alterations.

If more than one item in the list needs to be updated for each move then I need a new action on my controller to handle it. It's not a showstopper but it starts spilling over into Angular and everything becomes less clean than it ideally should be.


Example

Let's say we have a magazine and the magazine has a number of pages :

Original magazine
- double page advert for Ford    (page=1)
- article about Jeremy Clarkson  (page=2)
- double page advert for Audi    (page=3)
- article by James May           (page=4)
- article by Richard Hammond     (page=5)
- advert for Volkswagen          (page=6)

Option 1: Store integer page numbers

... in which we update up to N records per move

If I want to pull Richard Hammond's page up from page 5 to page 2 I can do so by altering its page number. However I also have to alter all the pages which it then displaces:

Updated magazine
- double page advert for Ford    (page=1)
- article by Richard Hammond     (page=2)(old_value=5)*
- article about Jeremy Clarkson  (page=3)(old_value=2)*
- double page advert for Audi    (page=4)(old_value=3)*
- article by James May           (page=5)(old_value=4)*
- advert for Volkswagen          (page=6)

* properties updated

However I don't want to update lots of records

- it doesn't fit my architecture

Let's say this is being done using javascript drag-n-drop re-ordering via Angular.js. I would ideally like to just update a value on the page which has been moved and leave the other pages alone. I want to send an http request to the CRUD resource for Richard Hammond's page saying that it's now been moved to the second page.

- and it doesn't scale

It's not a problem for me yet but at some point I may have 10,000 pages. I'd rather not update 9,999 of them when I move a new page to the front page.

Option 2: a linked list

... in which we update 3 records per move

If instead of storing the page's position, I instead store the page that comes before it then I reduce the number of actions from a maximum of N to 3.

Original magazine
- double page advert for Ford    (id = ford,         page_before = nil)
- article about Jeremy Clarkson  (id = clarkson,     page_before = ford)
- article by James May           (id = captain_slow, page_before = clarkson)
- double page advert for Audi    (id = audi,         page_before = captain_slow)
- article by Richard Hammond     (id = hamster,      page_before = audi)
- advert for Volkswagen          (id = vw,           page_before = hamster)

again we move the cheeky hamster up...

Updated magazine
- double page advert for Ford    (id = ford,         page_before = nil)
- article by Richard Hammond     (id = hamster,      page_before = ford)*
- article about Jeremy Clarkson  (id = clarkson,     page_before = hamster)*
- article by James May           (id = captain_slow, page_before = clarkson)
- double page advert for Audi    (id = audi,         page_before = captain_slow)
- advert for volkswagen          (id = vw,           page_before = audi)*

* properties updated

This requires updating three rows in the database: the page we moved, the page just below its old position and the page just below its new position.

It's better but it still involves updating three records and doesn't give me the resourceful CRUD behaviour I'm looking for.

Option 3: Non-integer positioning

...in which we update only 1 record per move (but need to housekeep)

Remember though, I still want to update only one record for each repositioning. In my quest to do this I take a different approach. Instead of storing the page position as an integer I store it as a float. This allows me to move an item by slipping it between two others:

Original magazine
- double page advert for Ford    (page=1.0)
- article about Jeremy Clarkson  (page=2.0)
- double page advert for Audi    (page=3.0)
- article by James May           (page=4.0)
- article by Richard Hammond     (page=5.0)
- advert for Volkswagen          (page=6.0)

and then we move Hamster again:

Updated magazine
- double page advert for Ford    (page=1.0)
- article by Richard Hammond     (page=1.5)*
- article about Jeremy Clarkson  (page=2.0)
- double page advert for Audi    (page=3.0)
- article by James May           (page=4.0)
- advert for Volkswagen          (page=6.0)

* properties updated

Each time we move an item, we chose a value somewhere between the item above and below it (say by taking the average of the two items we're slipping between).

Eventually though you need to reset...

Whatever algorithm you use for inserting the pages into each other will eventually run out of decimal places since you have to keep using smaller numbers. As you move items more and more times you gradually move down the floating point chain and eventually need a new position which is smaller than anything available.

Every now and then you therefore have to do a reset to re-index the list and bring it all back within range. This is ok but I'm interested to see whether there is a way to encode the ordering which doesn't require this housekeeping.

Is there an algorithm which requires only 1 update and no housekeeping?

Does an algorithm (or perhaps more accurately, a data encoding) exist for this problem which requires only one update and no housekeeping? If so can you explain it in plain english how it works (i.g. no reference to directed graphs or vertices...)? Muchos gracias.

UPDATE (post points-awarding)

I've awarded the bounty on this to the question I feel had the most interesting answer. Nobody was able to offer a solution (since from the looks of things there isn't one) so I've not marked any particular question as correct.

Adjusting the no-housekeeping criterion

After having spent even more time thinking about this problem, it occurs to me that the housekeeping criterion should actually be adjusted. The real danger with housekeeping is not that it's a hassle to do but that it should ideally be robust to a client who has an outstanding copy of a pre-housekept set.

Let's say that Joe loads up a page containing a list (using Angular) and then goes off to make a cup of tea. Just after he downloads it the housekeeping happens and re-indexes all items (1000, 2000, 3000 etc).. After he comes back from his cup of tea, he moves an item from 1010 1011. There is a risk at this point that the re-indexing will place his item into a position it wasn't intended to go.

As a note for the future - any housekeeping algorithm should ideally be robust to items submitted across different housekept versions of the list too. Alternatively you should version the housekeeping and create an error if someone tries to update across versions.

Issues with the linked list

While the linked list requires only a few updates it's got some drawbacks too:

  • it's not trivial to deal with deletions from the list (and you may have to adjust your #destroy method accordingly
  • it's not easy to order the list for retrieval

The method I would choose

I think that having seen all the discussion, I think I would choose the non-integer (or string) positioning:

  • it's robust to inserts and deletions
  • it works of a single update

It does however need housekeeping and as mentioned above, if you're going to be complete you will also need to version each housekeeping and raise an error if someone tries to update based on a previous list version.

like image 532
Peter Nixey Avatar asked Jan 04 '14 21:01

Peter Nixey


People also ask

What are the 4 CRUD components?

CRUD Meaning: CRUD is an acronym that comes from the world of computer programming and refers to the four functions that are considered necessary to implement a persistent storage application: create, read, update and delete.

What is basic CRUD?

In computer programming, create, read, update, and delete (CRUD) are the four basic operations of persistent storage. CRUD is also sometimes used to describe user interface conventions that facilitate viewing, searching, and changing information using computer-based forms and reports.

What is the importance of CRUD?

Why CRUD is Important? CRUD is frequently used in database and database design cases. Without CRUD operations, software developers can't get anything done. REST, a superset of CRUD for HTTP resources, is used in website building, for example.


1 Answers

You should add one more sensible constraint to your wish-list:

  • max O(log N) space for each item (N being total number of items)

For example, the linked-list solution holds to this - you need at least N possible values for pointer, so the pointer takes up log N space. If you don't have this limit, trivial solution (growing strings) already mentioned by Lasse Karlsen and tmyklebu are solution to your problem, but the memory grows one character up (in the worst case) for each operation). You need some limit and this is a sensible one.

Then, hear the answer:

No, there is no such algorithm.

Well, this is a strong statement, and not easy to hear, so I guess proof is required :) I tried to figure out general proof, posted a question on Computer Science Theory, but the general proof is really hard to do. Say we make it easier and we will explicitly assume there are two classes of solutions:

  • absolute addressing - address of each item is specified by some absolute reference (integer, float, string)
  • relative addressing - address of each item is specified relatively to other items (e.g. the linked list, tree, etc.)

To disprove the existence of absolute addressing algorithm is easy. Just take 3 items, A, B, C, and keep moving the last one between the first two. You will soon run out of the possible combinations for the address of the moved element and will need more bits. You will break the constraint of the limited space.

Disproving the existence of relative addressing is also easy. For non-trivial arrangement, certainly some two different positions exist to which some other items are referring to. Then if you move some item between these two positions, at least two items have to be changed - the one which referred to the old position and the one which will refer to the new position. This violates the constraint of only one item changed.

Q.E.D.

Don't be fascinated by complexity - it doesn't work

Now that we (and you) can admit your desired solution does not exist, why would you complicate your life with complex solution that do not work? They can't work, as we proved above. I think we got lost here. Guys here spent immense effort just to end up with overly complicated solutions that are even worse than the simplest solution proposed:

  • Gene's rational numbers - they grow 4-6 bits in his example, instead of just 1 bit which is required by the most trivial algorithm (described below). 9/14 has 4 + 4 = 8 bits, 19/21 has 5 + 5 = 10 bits, and the resultant number 65/84 has 7 + 7 = 14 bits!! And if we just look at those numbers, we see that 10/14 or 2/3 are much better solutions. It can be easily proven that the growing string solution is unbeatable, see below.

  • mhelvens' solution - in the worst case he will add a new correcting item after each operation. This will for sure occupy much more than one bit more.

These guys are very clever but obviously cannot bring something sensible. Someone has to tell them - STOP, there's no solution, and what you do simply can't be better than the most trivial solution you are afraid to offer :-)

Go back to square one, go simple

Now, go back to the list of your restrictions. One of them must be broken, you know that. Go through the list and ask, which one of these is least painful?

1) Violate memory constraint

This is hard to violate infinitely, because you have limited space... so be prepared to also violate the housekeeping constraint from time to time.

The solution to this is the solution already proposed by tmyklebu and mentioned by Lasse Karlsen - growing strings. Just consider binary strings of 0 and 1. You have items A, B and C and moving C between A and B. If there is no space between A and B, i.e. they look

A  xxx0 
B  xxx1

Then just add one more bit for C:

A  xxx0
C  xxx01
B  xxx1

In worst case, you need 1 bit after every operation. You can also work on bytes, not bits. Then in the worst case, you will have to add one byte for every 8 operations. It's all the same. And, it can be easily seen that this solution cannot be beaten. You must add at least one bit, and you cannot add less. In other words, no matter how the solution is complex, it can't be better than this.

Pros:

  • you have one update per item
  • can compare any two elements, but slow

Cons:

  • comparing or sorting will get very very slow as the strings grow
  • there will be a housekeeping

2) Violate one item modified constraint

This leads to the original linked-list solution. Also, there are plenty of balanced tree data structures, which are even better if you need to look up or compare items (which you didn't mention).

These can go with 3 items modified, balanced trees sometimes need more (when balance operations are needed), but as it is amortized O(1), in a long row of operations the number of modifications per operation is constant. In your case, I would use tree solution only if you need to look up or compare items. Otherwise, the linked-list solution rocks. Throwing it out just because they need 3 operations instead of 1? C'mon :)

Pros:

  • optimal memory use
  • fast generation of ordered list (one linear pass), no need to sort
  • fast operations
  • no housekeeping

Cons:

  • cannot easily compare two items. Can easily generate the order of all the items, but given two items randomly, comparing them will take O(N) for list and O(log N) for balanced trees.
  • 3 modified items instead of 1 (... letting up to you how much of a "con" this is)

3) Violate "no housekeeping" constraint

These are the solution with integers and floats, best described by Lasse Karlsen here. Also, the solutions from point 1) will fall here :). The key question was already mentioned by Lasse:

How often will housekeeping have to take place?

If you will use k-bit integers, then from the optimal state, when items are spread evenly in the integer space, the housekeeping will have to take place every k - log N operations, in the worst-case. You might then use more ore less sophisticated algorithms to restrict the number of items you "housekeep".

Pros:

  • optimal memory use
  • fast operation
  • can compare any two elements
  • one item modified per operation

Cons:

  • housekeeping

Conclusion - hope never dies

I think the best way, and the answers here prove that, is to decide which one of those constraints is least pain and just take one of those simple solutions formerly frowned upon.

But, hope never dies. When writing this, I realized that there would be your desired solution, if we just were able to ask the server!! Depends on the type of the server of course, but the classical SQL server already has the trees/linked-list implemented - for indices. The server is already doing the operations like "move this item before this one in the tree"!! But the server is doing based on the data, not based on our request. If we were able somehow to ask server to do this without the need to create perverse, endlessly growing data, that would be your desired solution! As I said, the server already does it - the solution is sooo close, but so far. If you can write your own server, you can do it :-)

like image 135
Tomas Avatar answered Nov 15 '22 22:11

Tomas