Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to ignore accent in SQLite query (Android)

Tags:

I am new in Android and I'm working on a query in SQLite. My problem is that when I use accent in strings e.g.

  • ÁÁÁ
  • ááá
  • ÀÀÀ
  • ààà
  • aaa
  • AAA

If I do:

SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%a%' ORDER BY MOVIE_NAME;

It's return:

  • AAA
  • aaa (It's ignoring the others)

But if I do:

SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%à%' ORDER BY MOVIE_NAME;

It's return:

  • ààà (ignoring the title "ÀÀÀ")

I want to select strings in a SQLite DB without caring for the accents and the case. Please help.

like image 952
andrehsouza Avatar asked Apr 29 '13 15:04

andrehsouza


2 Answers

Generally, string comparisons in SQL are controlled by column or expression COLLATE rules. In Android, only three collation sequences are pre-defined: BINARY (default), LOCALIZED and UNICODE. None of them is ideal for your use case, and the C API for installing new collation functions is unfortunately not exposed in the Java API.

To work around this:

  1. Add another column to your table, for example MOVIE_NAME_ASCII
  2. Store values into this column with the accent marks removed. You can remove accents by normalizing your strings to Unicode Normal Form D (NFD) and removing non-ASCII code points since NFD represents accented characters roughly as plain ASCII + combining accent markers:

    String asciiName = Normalizer.normalize(unicodeName, Normalizer.Form.NFD)     .replaceAll("[^\\p{ASCII}]", ""); 
  3. Do your text searches on this ASCII-normalized column but display data from the original unicode column.

like image 119
laalto Avatar answered Sep 18 '22 15:09

laalto


In Android sqlite, LIKE and GLOB ignore both COLLATE LOCALIZED and COLLATE UNICODE (they only work for ORDER BY). However, there is a solution without having to add extra columns to your table. As @asat explains in this answer, you can use GLOB with a pattern that will replace each letter with all the available alternatives of that letter. In Java:

public static String addTildeOptions(String searchText) {
    return searchText.toLowerCase()
                     .replaceAll("[aáàäâã]", "\\[aáàäâã\\]")
                     .replaceAll("[eéèëê]", "\\[eéèëê\\]")
                     .replaceAll("[iíìî]", "\\[iíìî\\]")
                     .replaceAll("[oóòöôõ]", "\\[oóòöôõ\\]")
                     .replaceAll("[uúùüû]", "\\[uúùüû\\]")
                     .replace("*", "[*]")
                     .replace("?", "[?]");
}

And then (not literally like this, of course):

SELECT * from table WHERE lower(column) GLOB "*addTildeOptions(searchText)*"

This way, for example in Spanish, a user searching for either mas or más will get the search converted into m[aáàäâã]s, returning both results.

It is important to notice that GLOB ignores COLLATE NOCASE, that's why I converted everything to lower case both in the function and in the query. Notice also that the lower() function in sqlite doesn't work on non-ASCII characters - but again those are probably the ones that you are already replacing!

The function also replaces both GLOB wildcards, * and ?, with "escaped" versions.

like image 42
descatalogado Avatar answered Sep 22 '22 15:09

descatalogado