Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing a natural sort value to store in DB for string sorting

Here are some examples of strings (mainly addresses):

12
20
43-B
43-C
123
2500

Now I put those in what I consider the "correct" order. If I were to have these values in a column in a DB table and return those in a MySQL search, I would get:

12
123
20
2500
43-B
43-C

Obviously that's incorrect -- 20 is not greater than 123.

It's pretty easy to figure this out if I can guarantee that the value consists of purely integers, but when you throw in 43-B and 43-C (or even 12A or whatever), then we start having problems. However, I can't simply strip out the numbers! I'm not entirely sure what it represents at this point, but I do have values such as 40W1.

Personally, I'd sort that under 40 rather than 4000, but it's kind of a very rare edge case so I'm not too worried about that particular example. I do need to keep the letters in mind, though, because 40B would come before 40C -- but I would also expect 40-B to come before 40C. Tricky, right? I know.

I am willing to assume only alpha-numeric characters, though (i.e. strip the - from the string).

What I want to do is convert that string into a series of numbers that are definitely sortable.

For instance, 43-B might turn into something like 10000031205 (padded) and gets stored in the database along with the rest of the row. When I do a search for my addresses, I can now sort by the sort column, and I get everything in order!

Things I cannot do:

  • Compare them directly at run time
  • Do this search in MySQL (the value needs to be calculated on a row by row basis)
  • Use sort/asort/ksort or any sorting function in PHP

I need a value that can be stored in my database or search index upon which I can sort later!

Unfortunately, all of my attempts thus far have failed to produce the results I'm looking for. Any ideas?

like image 286
Jemaclus Avatar asked Apr 15 '26 04:04

Jemaclus


1 Answers

I don't claim it to be the most efficient format, but it would work. I assume no negative numbers.

I padded to 5 digits, but the pad needs to be bigger than the largest number of digits in a numeric sequence.

$input = '43-B1';
$nat = preg_replace_callback('#\d+#', function($m) {
    return str_pad($m[0], 5, '0', STR_PAD_LEFT);
}, $input);
echo $nat;

demo http://codepad.viper-7.com/kefb4L

like image 191
goat Avatar answered Apr 16 '26 22:04

goat



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!