Our Oracle DB is UTF8. We are storing addresses that need to be searchable. Some of the street names contain non-english characters (e.g. Peña Báináõ ) this needs to be searchable either as "Peña Báináõ" or with english equivalent charactes like "Pena Bainao". What we did is to convert the text on the query, something like:
SELECT CONVERT('Peña Báináõ','US7ASCII') as converted FROM dual;
But the issue here is that not all of the characters have an English equivalent (not even some pretty obvious ones like ñ or õ) so we end up with the text converted to:
Pe?a Baina?
So if the user tries to find that addres typing "Pena Bainao" he can't find it because "Pena Bainao" is different from ""Pe?a Baina?"".
We have figured out some dirty workarrounds on this, but I wanted to check first if someone has found a more elegant solution.
Here is a list of some characters that are not converted to US7ASCII:
Character UTF8 Code Possible Equivalent
æ - u00E6 - ae
å - u00E5 - a
ã - u00E3 - a
ñ - u00F1 - n
õ - u00F5 - o
1) Using nlssort
with BINARY_AI (Both case and accent insentive):
SQL> select nlssort('Peña Báináõ', 'NLS_SORT = BINARY_AI') C from dual;
C
------------------------
70656E61206261696E616F00
SQL> select nlssort('Pena Bainao', 'NLS_SORT = BINARY_AI') C from dual;
C
------------------------
70656E61206261696E616F00
SQL> select nlssort('pena bainao', 'NLS_SORT = BINARY_AI') C from dual;
C
------------------------
70656E61206261696E616F00
SQL> select 'true' T from dual where nlssort('pena bainao', 'NLS_SORT = BINARY_AI') = nlssort('Peña Báináõ', 'NLS_SORT = BINARY_AI') ;
T
----
true
2) You could also alter the NLS_SORT session variable to binary_ai and then you would not have to specify NLS_SORT every time:
SQL> select 'true' T from dual where nlssort('pena bainao') = nlssort('Peña Báináõ') ;
no rows selected
SQL> alter session set nls_sort = binary_ai;
Session altered.
SQL> select 'true' T from dual where nlssort('pena bainao') = nlssort('Peña Báináõ') ;
T
----
true
3) To drop the use of nlssort
function and change the sematics of everything, also set the nls_comp session variable:
SQL> select 'true' T from dual where 'pena bainao' = 'Peña Báináõ';
no rows selected
SQL> alter session set nls_comp = linguistic;
Session altered.
SQL> select 'true' T from dual where 'pena bainao' = 'Peña Báináõ';
T
----
true
Option 1 changes only local behavior, the query where you want different results. Option 2 and 3 will change behavior of other queries and may not be what you want. See Table 5-2 of Oracle® Database Globalization Support Guide. Also look the section "Using Linguistic Indexes" to see how to be able to use indexes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With