Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort using utf characters in mysql or php ? best solutions

Using MySQL, i'm selecting a list of songs in spanish that i would like to sort. Here's a list of names that is returned by the query:

  • ¡Decirevilla!
  • Alhambra
  • 123 pasitos
  • África
  • Arroz
  • Decir

The sorted list should look like this:

  • 123 pasitos
  • África
  • Alhambra
  • Arroz
  • ¡Decirevilla!
  • Decir

After all of the research i've read, i've concluded that there is no reasonable way to achieve this using MySQL. I've tried collation, charset, etc... but there is no way the character ¡, ?, etc... can by sorted accordingly to my desired result. Even the Á is not sorted the way i want to...

Question 1: Is this a reasonable conclusion?

I believe the only way to achieve this is by passing the results to an array in php and then sort the array using a custom function... all this using the function usort (need to sort by value and i don't care about maintaning the key association). Something similar to this:

function normalize($a, $b) {
  if ($a == $b) {
     return 0;
  }

  return ($a < $b) ? -1 : 1;
}


$tracks = array();

while ($row = $result->fetch_assoc()) {
    $tracks[] = $row;
}

usort($tracks, 'normalize');

Question 2: Is this the best way to achieve a custom sorting?

Here's where i'm hitting a wall:

Question 3: I have no idea how to create the normalize function to sort the names accordingly to my needs. How do i ignore certain characters (¡, ?, ', !, ¿) and how do i replace other characters with the natural equivalent (Á -> A, É -> E, etc..) I believe that by ignoring certain characters and replacing others, i can achieve the sorting i'm loojing for...

Question 4: All this make sense? Am i on the right path?

Thanks in advance for all your advice. Marco

like image 975
Marco Avatar asked Nov 14 '22 02:11

Marco


1 Answers

You could add your own collation to MySQL. Then you could ignore whatever characters you don't care about, strip accents as needed, and generally sort things in any consistent way you desire.

Doing the mangled-collation on the client side (i.e. in PHP rather than in the database) won't be as quick as doing it in the database. This approach will also fail miserably as soon as you have to add LIMIT and OFFSET clauses to your query. I'm not sure if custom collations do The Right Thing for MAX() similar functions but doing the mangled-collation in PHP certainly won't unless you want to pull over the whole table, sort it, and then grab just one entry.

So, I would consider doing the collation outside the database as a last resort.

Another option, if you don't want to build your own collation, is to build an artificial column in your table that does sort properly. You could use a normalize() function in PHP-land (something like Jacob's would be reasonable starting point) and keep the result in the database as a column called, say, sortable_title; then ORDER BY sortable_title would do the trick. You'd want a normalize() PHP function that produced a list like this (no punctuation, all lower case, accents stripped, ...):

  • 123 pasitos
  • africa
  • alhambra
  • arroz
  • decirevilla
  • decir

So that a simple ASCII-betical sort will do The Right Thing. Of course, you would have to initialize sortable_title when doing INSERTs and regenerate it during UPDATEs but that should be fairly straight forward if your code is properly encapsulated.

Question 4: I think I'm going to disagree with Jacob and say that you're not going in the right direction by moving the collation out of the database. I'm not saying that you're completely off track but you're better off letting MySQL handle the sorting even though you might end up giving it some help with something like the sortable_title hack outlined above.

like image 51
mu is too short Avatar answered Dec 17 '22 06:12

mu is too short