Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ORDER BY alphanumeric characters only in SQLite

Tags:

sql

sqlite

I am sorting songs in SQLite (on Android). I want to order them:

  1. Case-insensitive
  2. With leading-digits at the end, by integer value.
  3. Without punctuation (e.g. parentheses, periods, hyphens, apostrophes)

I have 1 & 2 working (see below). However, I can't figure out how to replace every character (other than letters, numbers, and spaces) other than to call replace() for each character.

Is there a way to do this other than ~32 calls to replace()?
(ASCII values 33-47,58-64,91-96,123-126)


Here is a test table. The value 'n' should ideally come out in order. (No, you cannot order by n ;)

create table songs (n integer, name text);
insert into songs (n,name) values (6,'I''ll Be That Girl');
insert into songs (n,name) values (24,'1969');
insert into songs (n,name) values (9,'La Moldau');
insert into songs (n,name) values (20,'Pule');
insert into songs (n,name) values (7,'I''m a Rainbow Too');
insert into songs (n,name) values (21,'5 Years');
insert into songs (n,name) values (18,'Pressure');
insert into songs (n,name) values (13,'Lagan');
insert into songs (n,name) values (1,'any old wind that blows');
insert into songs (n,name) values (17,'Poles Apart');
insert into songs (n,name) values (8,'Imagine');
insert into songs (n,name) values (14,'Last Stop before Heaven');
insert into songs (n,name) values (3,'I Before E Except After C');
insert into songs (n,name) values (4,'i do, i do, i do');
insert into songs (n,name) values (22,'99 Luftballons');
insert into songs (n,name) values (12,'L''accord parfait');
insert into songs (n,name) values (15,'Pluto');
insert into songs (n,name) values (19,'The Promise');
insert into songs (n,name) values (2,'(Don''t Fear) The Reaper');
insert into songs (n,name) values (10,'L.A. Nights');
insert into songs (n,name) values (23,'911 is a Joke');
insert into songs (n,name) values (5,'Ichthyosaurs Are Awesome');
insert into songs (n,name) values (11,'Labradors are Lovely');
insert into songs (n,name) values (16,'P.O.D.-Boom');

Here's the solution to just 1 & 2 above:

SELECT n
FROM songs
ORDER BY
  CASE WHEN name GLOB '[0-9]*' THEN 1
       ELSE 0
  END,
  CASE WHEN name GLOB '[0-9]*' THEN CAST(name AS INT)
       ELSE name
  END
COLLATE NOCASE

For this test set it produces results in this order: 2,1,3,4,6,7,5,8,12,10,9,11,13,14,16,15,17,18,20,19,21,22,23,24

I can fix this particular test set with manual replaces for each undesired character:

SELECT n
FROM songs
ORDER BY
  CASE WHEN name GLOB '[0-9]*' THEN 1
       ELSE 0
  END,
  CASE WHEN name GLOB '[0-9]*' THEN CAST(name AS INT)
       ELSE
         replace(
           replace(
             replace(
               replace(name,'.',''),
               '(',''
             ),
             '''',''
           ),
           '  ',' '
         )
  END
COLLATE NOCASE
like image 916
Phrogz Avatar asked Sep 17 '15 17:09

Phrogz


3 Answers

I would add an additional column in the table, called "SortingName" or something. Calculate this value when inserting, ideally not in SQL but in a higher level language where you have all these nice string operations.

I didn't really understand this thing with the number. I guess the simplest thing you can do is extract the number before insert and put it into another column, like "SortingNumber".

Then simply sort like this:

Order By
  SortingName,
  SortingNumber

(Or the other way around.)

Another advantage is performance. You usually read data much more often then you write it. You can even create indexes on these two sorting columns, which is usually not possible if you calculate it in the query.

like image 61
Stefan Steinegger Avatar answered Nov 16 '22 07:11

Stefan Steinegger


The first solution (when DB and application can be modified):

Add to your table single column e.g. solumntForSorting. Then on your application before inserting, concatenate your second condition ("With leading-digits at the end, by integer value.") as 0 or 1 to song name which first 'was cleaned' from undesired symbols. So on solumntForSorting you will get something like this: 0Im a Rainbow Too and 1911 is a Joke.

The second solution (when only application can be modified):

If you have to sort data excluding some symbols and you are not allowed to change your DB, you will get a slower selection because of filtering undesired values. Most of the overhead will be at CPU time and memory.

Using replace function is tedious from my point of view, that is why I suggest using CTE with list of values you want to drop, like this ('.', '.', ';', '(', ')', '''', '-'). CTE will be bulky like multiple replace but it is easier to modify and maintain.

Try this solution:

 WITH RECURSIVE 
 ordering_name_substr(len, name, subsstr, hex_subsstr, number) 
 AS (SELECT  length(name), name, substr(name, 1, 1), hex(substr(name, 1, 1)), 1  
       FROM songs
      UNION ALL 
     SELECT len, name, substr(name, number + 1, 1),
            hex(substr(name, number + 1, 1)), number + 1
       FROM ordering_name_substr WHERE number < len),
 last_order_cretaria(value, old_name)
  AS (select GROUP_CONCAT(subsstr, ''), name 
           from ordering_name_substr 
        where hex_subsstr not in
       ('28', '29', '2C', '2E', '27') group by name )

SELECT S.n, S.name
FROM songs AS S LEFT JOIN last_order_cretaria AS OC
ON S.name = OC.old_name
ORDER BY
  CASE WHEN name GLOB '[0-9]*' THEN 1
       ELSE 0
  END,
  CASE WHEN name GLOB '[0-9]*' THEN CAST(name AS INT)
       ELSE
         OC.value
  END
COLLATE NOCASE

I have tested on sqlfiddle.

In the list ('28', '29', '2C', '2E', '27') you have values of ASCII codes (in hex) which you want to escape from be considered in ordering.

You also can try to use values itself like: ('.', '.', ';', '(', ')', '''', '-').

WITH RECURSIVE 
 ordering_name_substr(len, name, subsstr, number) 
 AS (SELECT length(name), name, substr(name, 1, 1), 1  
       FROM songs
      UNION ALL 
     SELECT len, name, substr(name, number + 1, 1),
            number + 1
       FROM ordering_name_substr WHERE number < len),
 last_order_cretaria(value, old_name)
  AS (select GROUP_CONCAT(subsstr, ''), name 
           from ordering_name_substr 
        where subsstr not in
       ('.', '.', ';', '(', ')', '''', '-') group by name )

SELECT S.n, S.name
FROM songs AS S LEFT JOIN last_order_cretaria AS OC
ON S.name = OC.old_name
ORDER BY
  CASE WHEN name GLOB '[0-9]*' THEN 1
       ELSE 0
  END,
  CASE WHEN name GLOB '[0-9]*' THEN CAST(name AS INT)
       ELSE
         OC.value
  END
COLLATE NOCASE

To make this sorting work fast and simple you have to be able to change your DB and application.

like image 4
Mikhailov Valentin Avatar answered Nov 16 '22 08:11

Mikhailov Valentin


In my opinion, the highest performance approach is to create a trigger to fill a new field named sort_key. You will need a primary key.

CREATE TABLE songs (n INTEGER, name TEXT, 
                    sort_key TEXT, 
                    ID INTEGER PRIMARY KEY AUTOINCREMENT);

CREATE TRIGGER songs_key_trigger
    AFTER INSERT ON songs FOR EACH ROW
    BEGIN n
        Declare @sort_key as varchar(255)
        -- calculate and call here your slugify function
        -- to fill sort_key from 'new.n' and 'new.name'
        UPDATE songs 
          SET sort_key = @sort_key
          WHERE ID = new.ID;
    END

Realize that this approach is index friendly, you can create an index over new column to avoid table full scan operations.

like image 3
dani herrera Avatar answered Nov 16 '22 09:11

dani herrera