Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to determine the language(English, Chinese...) of a given string in Oracle?

How to determine the language (English, Chinese...) of a given sting (table column value) in Oracle(multi language environment)?

like image 772
user3614051 Avatar asked May 07 '14 21:05

user3614051


2 Answers

It should be possible to use a library like Language Dectection for Java and tie it with your PL/SQL.

It will probably be more efficient to use SQL to do naive Bayesian filtering and use language profiles derived e.g. from Wikipedia (they are neatly packed here).

These are just pointers, not a full solution as requested for the bounty, but should help bounty-seekers.

like image 57
Alex Pakka Avatar answered Nov 02 '22 19:11

Alex Pakka


Oracle Globalization Development Kit can detect languages.

The GDK is included with Oracle but it is not installed in the database by default. To load the .jar files into the database find the jlib directory in the Oracle home and run this operating system command:

loadjava -u USER_NAME@SID orai18n.jar orai18n-collation.jar orai18n-lcsd.jar orai18n-mapping.jar orai18n-net.jar orai18n-servlet.jar orai18n-tools.jar orai18n-translation.jar orai18n-utility.jar

Some extra Java privileges are needed, even if your user has DBA. Run this command and then re-connect:

exec dbms_java.grant_permission( 'YOUR_USER_NAME', 'SYS:java.lang.RuntimePermission', 'getClassLoader', '' );

Create a Java class to do the detection. Below is a very simple example that returns the best guess for a string:

create or replace and compile java source named "Language_Detector"
as
import oracle.i18n.lcsd.*;
public class Language_Detector
{
    public static String detect(String some_string)
    {
        LCSDetector detector = new LCSDetector();
        detector.detect(some_string);
        LCSDResultSet detector_results = detector.getResult();
        return detector_results.getORALanguage();
    }
}
/

Wrap the Java class in a PL/SQL function:

create or replace function detect_language(some_string varchar2)
return varchar2
as language java
name 'Language_Detector.detect(java.lang.String) return java.lang.String';
/

Create a sample table:

create table unknown_language(id number, text varchar2(4000));

insert into unknown_language
select 1, 'The quick brown fox jumps over the lazy dog' from dual union all
select 2, 'El zorro marrón rápido salta sobre el perro perezoso' from dual union all
select 3, '敏捷的棕色狐狸跳过懒狗' from dual union all
select 4, 'Der schnelle braune Fuchs springt über den faulen Hund' from dual union all
select 5, 'Быстрая коричневая лиса прыгает через ленивую собаку' from dual;

Now the function is available in simple SELECT statements. In this trivial example the language detection works perfectly.

select id, detect_language(text) language
from unknown_language
order by id;

ID  LANGUAGE
--  --------
1   ENGLISH
2   SPANISH
3   SIMPLIFIED CHINESE
4   GERMAN
5   RUSSIAN
like image 20
Jon Heller Avatar answered Nov 02 '22 19:11

Jon Heller