Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maintaining order of elements in MySQL database tables OR inserting new rows in specific positions for MySQL

I have a database table that maintains some information and is required to preserve order. Essentially if I have elements 1 through 5 listed and I want to add a new element, then it could be inserted anywhere in the existing row, either at the last, after 5, the beginning before 1 or somewhere in the middle such as after 3. Is there a way to do this using MySQL INSERT statements and specifying after which row we should insert the index?

I presume not. So my strategy to go about doing this is to create another column 'order_number' that basically records the order of the elements. For instance, if the record table has primary key (record_id) and the order_number listed side by side, it would look like this:

 record_id     order_number

    1              1    
    2              2    
    3              3
    4              4
    5              5

TO add a new element to this row after row 3, the resulting end table will look like this:

 record_id     order_number

    1             1
    2             2
    3             3
  **6**         **4**         <------ added row 
    4           **5**         <-- changed order_number
    5           **6**         <-- changed order_number

In such a situation, I can clearly achieve the order that I want by simply selecting the data that i want and providing an Order By order_number asc clause.

However, as you can see, to do a simple Insert, it requires me to update every other row's order_number that appears after it. The table is expected to have an extensive amount of rows to it (magnitude of 100,000) at minimum and simply updating every other row (hence locking the table) at every single insert operation is not at all feasible.

What is a better recommended strategy in this case ?

like image 696
Parijat Kalia Avatar asked Dec 02 '12 23:12

Parijat Kalia


2 Answers

If the order_number is not to be shown but only used for ordering, I suggest you use a decimal datatype instead of integer. This way, when you have to insert a row "between" two existing rows, you can set as order_number, the average of the two existing order numbers.

In your example:

 record_id     order_number

    1             1.0
    2             2.0
    3             3.0
  **6**           3.5          <---- added row 
    4             4.0           <-- no change
    5             5.0           <-- no change

There is a problem though that if you keep inserting numbers in the same area, some order numbers may result to be too close for the precision of the datatype you have chosen, close enough as to not be distinguished form one another.

To avoid this, your insert procedure will have to examine whether the two existing order number are too close. In that case, it could reassign some order numbers of other nearby rows, "stretching" the order numbers above and below to "make space" for a new value.

You could also have a "cleanup" procedure that runs periodically and does this "stretching" in the whole or large parts of the table.

like image 163
ypercubeᵀᴹ Avatar answered Nov 15 '22 06:11

ypercubeᵀᴹ


I found this answer for a similar question: https://stackoverflow.com/a/6333717/1010050

In summary, it increments all of the record IDs below the one you will be adding, to maintain consistency. That still requires you to update all of the record IDs, so it isn't the most efficient. It does have the benefit, compared to your method, of maintaining the physical order in the database, rather than just a virtual order like you have.

Another way I can think of would be to record the child and parent record IDs for each record, rather than an order number, similar to a Doubly Linked List. Inserting an element in the middle would then only require updating two other records regardless of table size. This has the same disadvantage as your solution where the physical ordering would be wrong, so reading from the table in an ordered fashion would be more costly.

For example:

record_id        parent_id      child_id
   0                 NULL          1
   1                 0             2
   2                 1             NULL

When we insert a record after record_id = 1, the table becomes:

record_id        parent_id      child_id
   0                 NULL          1
   1                 0             3
   2                 3             NULL
   3                 1             2

Note how only the parent_id and child_id for IDs 1 and 2 had to change.

I think between these two solutions, the biggest thing to consider is what is your most common operation: reading out the values in order, or writing a new value in the middle somewhere. If it's reading, then updating the record IDs would be your best option in order to maintain the physical order of the database. If writing, then you can optimize for that by using the method I suggested similar to a doubly linked list, or your own order method.

Summary after question update: Seeing that updating most of the records is not feasible, then the other answer I found is definitely not valid. The solution of treating it similar to a doubly linked list is still plausible, however.

like image 24
Colin Williams Avatar answered Nov 15 '22 07:11

Colin Williams