How to detect language of text?

1 Answers

You can figure out whether the characters are from the Arabic, Chinese, or Japanese sections of the Unicode map.

If you look at the list on Wikipedia, you'll see that each of those languages has many sections of the map. But you're not doing translation, so you don't need to worry about every last glyph.

For example, your Chinese text begins (in hex) 0x8FD9 0x662F 0x4E00 - and those are all in the "CJK Unified Ideographs" section, which is Chinese. Here are a few ranges to get you started:

Arabic (0600–06FF)

Japanese

Hiragana (3040–309F)
Katakana (30A0–30FF)
Kanbun (3190–319F)

Chinese

CJK Unified Ideographs (4E00–9FFF)

(I got the hex for your Chinese by using a Chinese to Unicode Converter.)

answered Sep 20 '22 17:09

egrunin

Related questions
                            
                                extending PDO class
                            
                                Cron Tasks on load balanced web servers
                            
                                jQuery (or any web tool) Nested Expression Builder
                            
                                How to develop a CRM system [closed]
                            
                                MIME Type spoofing
                            
                                Imagick PHP 5.4 extension does not work with relative paths. (windows)
                            
                                PHP Encrypt and Windows Decrypt
                            
                                Zend_Service_Twitter - Make API v1.1 ready
                            
                                Is my factory an anti-pattern?
                            
                                Cloning an entity in Symfony2 saves changes to original record and cloned record when persisted via Doctrine
                            
                                Facebook Graph API caching JSON response
                            
                                How can i change the style of the line chart generated with phpexcel?
                            
                                Laravel Request: it's correct injecting Request directly in the controller constructor?
                            
                                Dynamic partial arguments in AngularJS routing
                            
                                Slicing HTML based on delimiter
                            
                                How to organize this context to collect properly the answers of each participant?
                            
                                In PHP, how many DB calls per page is okay?
                            
                                HipHop PHP (was Hyper PHP by Facebook)
                            
                                "Did you mean" feature on a dictionary database
                            
                                Ajax file upload

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to detect language of text?

Tags:

php

language-detection

Yeti

People also ask

1 Answers

egrunin

Recent Activity

Donate For Us