How to detect language

Tags:

Are there any good, open source engines out there for detecting what language a text is in, perhaps with a probability metric? One that I can run locally and doesn't query Google or Bing? I'd like to detect language for each page in about 15 million pages of OCR'ed text.

Not all documents will contain languages which use the Latin alphabet.

457

asked Jul 03 '10 22:07

niklassaers

1 Answers

You can surely build your own, given some statistics about letter frequencies, digraph frequencies, etc, of your target languages.

Then release it as open source. And voila, you have an open source engine for detecting the language of text!

answered Sep 20 '22 17:09

Dolph

Related questions
                            
                                Detect hover capability
                            
                                Detect which view your finger is sliding over in Android
                            
                                Fastest Way Of Detecting User's Country
                            
                                Circle collision in JavaScript
                            
                                How can the page know I'm analyzing it with firebug
                            
                                iOS - detect when more than one finger is on the screen
                            
                                Best way to detect <= IE10
                            
                                Find peak (regions) in 2D data
                            
                                Algorithm to count people in a crowd
                            
                                Trying to find object coordinates (x,y) in image, my neural network seems to optimize error without learning [closed]
                            
                                how to tell if android user came from home screen app
                            
                                How to detect if browser supports file uploading? (Mobile + Desktop)
                            
                                Detect mouse direction - JavaScript
                            
                                HoughCircles circle detection using opencv and python-
                            
                                Finding the Sky/Ground separation in OpenCV
                            
                                What would be the most efficient way to detect all closed paths in a collection of segments and connectors?
                            
                                How do I detect Chromium specifically vs. Chrome?
                            
                                How does Windows Azure Service Bus Queues Duplicate Detection work?
                            
                                Detect dead code in C#
                            
                                What is a proper way to detect the support of css "flex-box" and "flex-wrap"?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to detect language

Tags:

detection

language-detection

niklassaers

People also ask

1 Answers

Dolph

Recent Activity

Donate For Us