Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect a language of a string in node.js

I use a function from GitHub to my project.

Function sends a welcome email when a new user signs up and a goodbye email when user accounts are deleted. The function is loading to Firebase Cloud Function.

I'm trying to supplement the code so that it determines by the name of the user in what language he needs to send the message.

Example:

If the name of the user typed on the Hebrew language, the function sends a message on Hebrew to the user.

If the name of the user typed on the Russian language, the function sends a message on Russian to the user.

If the name of the user typed on the English language, the function sends a message on English to the user.

Note:

This does not connect with a browser, because a user will register from the android application. And after user Authentication on Firebase, he will get a message from Firebase Cloud Function.

In node.js the code below does not work!

if (/^[a-zA-Z]+$/.test(text)) //if the English language 
{
  ...
} 
else //if the not English language
{
  ...
}

I will glad to any helps!

Maybe there is another solution to localization the message?

Thanks!!!

like image 931
Yury Matatov Avatar asked Apr 29 '19 04:04

Yury Matatov


2 Answers

You can use the languagedetect node.js library to detect the language of the string.

However, since your requirement is to send the message based on the user's language, it is better to provide him an option to select his preferred language or use javascript to detect language version of the browser with navigator.language

like image 165
Dani Akash Avatar answered Nov 19 '22 20:11

Dani Akash


Facebook's FastText is the best solution for this problem which doesn't require some large slow machine learning model.

@smodin/fast-text-language-detection is how you can use it in a nodeJS application https://www.npmjs.com/package/@smodin/fast-text-language-detection (disclaimer: out of necessity, I'm the creator)

Context:

I Run a large multi-lingual site, and I was finding that franc and LanguageDetect (the current most popular nodeJS libraries) weren't accurate enough, despite implementing them for a month.

Based on further research, and this blog ( https://towardsdatascience.com/benchmarking-language-detection-for-nlp-8250ea8b67c ), I determined that facebook's FastText is the best solution out there because:

  1. It has better accuracy than typical approaches using short unicode blocks to predict languages which often fails on tasks with little text and abundance of proper nouns

  2. It doesn't have weird caveats which are abundant in the unicode predictions

Downside is that it's 150MB, so it's not a reasonable solution on the front end. It works best on longer text, but performs significantly better on shorter texts than franc and LanguageDetect

EDIT: Accuracy Testing. I've just added results of testing 550k sentences from 99 languages of sentences from 30-250 characters in length. The accuracy is around 99% for most major languages, even when the char length is reduced to 10-40 chars. See more here. I also added franc and languagedetect accuracies for reference here.

like image 24
Kevin Danikowski Avatar answered Nov 19 '22 22:11

Kevin Danikowski