Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle search text with non-english characters

Our Oracle DB is UTF8. We are storing addresses that need to be searchable. Some of the street names contain non-english characters (e.g. Peña Báináõ ) this needs to be searchable either as "Peña Báináõ" or with english equivalent charactes like "Pena Bainao". What we did is to convert the text on the query, something like:

SELECT CONVERT('Peña Báináõ','US7ASCII') as converted FROM dual;

But the issue here is that not all of the characters have an English equivalent (not even some pretty obvious ones like ñ or õ) so we end up with the text converted to:

Pe?a Baina?

So if the user tries to find that addres typing "Pena Bainao" he can't find it because "Pena Bainao" is different from ""Pe?a Baina?"".

We have figured out some dirty workarrounds on this, but I wanted to check first if someone has found a more elegant solution.

Here is a list of some characters that are not converted to US7ASCII:

Character     UTF8 Code     Possible Equivalent   
æ         -   u00E6      -      ae
å         -   u00E5      -       a
ã         -   u00E3      -       a
ñ         -   u00F1      -       n
õ         -   u00F5      -       o
like image 487
Chepech Avatar asked Jul 13 '11 16:07

Chepech


1 Answers

1) Using nlssort with BINARY_AI (Both case and accent insentive):

SQL> select nlssort('Peña Báináõ', 'NLS_SORT = BINARY_AI') C from dual;

C
------------------------
70656E61206261696E616F00

SQL> select nlssort('Pena Bainao', 'NLS_SORT = BINARY_AI') C from dual;

C
------------------------
70656E61206261696E616F00

SQL> select nlssort('pena bainao', 'NLS_SORT = BINARY_AI') C from dual;

C
------------------------
70656E61206261696E616F00

SQL> select 'true' T from dual where nlssort('pena bainao', 'NLS_SORT = BINARY_AI') = nlssort('Peña Báináõ', 'NLS_SORT = BINARY_AI') ;

T
----
true

2) You could also alter the NLS_SORT session variable to binary_ai and then you would not have to specify NLS_SORT every time:

SQL> select 'true' T from dual where nlssort('pena bainao') = nlssort('Peña Báináõ') ;

no rows selected

SQL> alter session set nls_sort = binary_ai;

Session altered.

SQL> select 'true' T from dual where nlssort('pena bainao') = nlssort('Peña Báináõ') ;

T
----
true

3) To drop the use of nlssort function and change the sematics of everything, also set the nls_comp session variable:

SQL> select 'true' T from dual where 'pena bainao' = 'Peña Báináõ';

no rows selected

SQL> alter session set nls_comp = linguistic;

Session altered.

SQL> select 'true' T from dual where 'pena bainao' = 'Peña Báináõ';

T
----
true

Option 1 changes only local behavior, the query where you want different results. Option 2 and 3 will change behavior of other queries and may not be what you want. See Table 5-2 of Oracle® Database Globalization Support Guide. Also look the section "Using Linguistic Indexes" to see how to be able to use indexes.

like image 95
Shannon Severance Avatar answered Nov 05 '22 02:11

Shannon Severance