Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to ignore special characters when using ORDER BY in a MySQL query

I have the following MySQL query that provides data to a Python web page. On the web page, I have a list of song titles, and I want it to be alphabetized ignoring punctuation and spaces. My MySQL database is UTF-8 encoded, and some of the punctuation that needs to be ignored is special characters such as curly apostrophes, etc.

SELECT * FROM Tracks\
JOIN Artists USING (ArtistID)\
JOIN Albums USING (AlbumID)\
JOIN Songs USING (SongID)\
ORDER BY UPPER(\
REPLACE(\
REPLACE(\
REPLACE(\
REPLACE(\
REPLACE(\
REPLACE(\
REPLACE(\
REPLACE(\
REPLACE(\
REPLACE(\
REPLACE(\
REPLACE(\
REPLACE(SoName, ' ', ''), /* space */\
                        ',', ''), /* comma */\
                        '.', ''), /* period */\
                        ':', ''), /* colon */\
                        ';', ''), /* semicolon */\
                        '!', ''), /* exclamation point */\
                        '?', ''), /* question mark */\
                   '\u201c', ''), /* left curly double quote */\
                   '\u201d', ''), /* right curly double quote */\
                   '\u2019', ''), /* right curly single quote (apostrophe) */\
                   '\u2013', ''), /* n-dash */\
                   '\u2014', ''), /* m-dash */\
                   '\u2026', '') /* ellipsis */), (SongID), UPPER(AlTitle)

The REPLACE in my query seems to work perfectly for the non-special characters, like the space, comma, period, etc., but it seems to skip over the special characters.

My guess is that the characters need to be written in a different format. I tried the following with no success: REPLACE(SoName, '\u2026', '') REPLACE(SoName, u'\2026', '') REPLACE(SoName, 0xE280A6, '')...

like image 730
Samuel Bradshaw Avatar asked Nov 04 '22 00:11

Samuel Bradshaw


1 Answers

MySQL string literals do not provide an escape sequence for multi-byte characters. This has been a feature request for over 7 years and is still awaiting triage: I wouldn't hold my breath that it will be resolved any time soon.

You must either put the actual character in your string literal, or else know its constituent bytes in your desired encoding (in which case you could then use something like CHAR()).

like image 53
eggyal Avatar answered Nov 09 '22 11:11

eggyal