Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JavaScript regex to get first character of each word in a sentence (Persian, and English sentence)

Suppose I have the following string:

var englishSentence = 'Hellow World';
var persianSentence = 'گروه جوانان خلاق';

For english I use from following regex, but how can I write a regex to support Persian, or mix of them.

  var matches = englishSentence.match(/\b(\w)/g);
  acronym = matches.join('');
like image 357
jones Avatar asked Dec 14 '22 17:12

jones


1 Answers

Root cause

There is no way to match a Unicode word boundary, \b is not Unicode aware even in ECMA 2018.

Solutions

For ECMA2018 compatible browsers (e.g. the latest versions of Chrome as of April 2018) you may use:

var englishSentence = 'Hellow World';
var persianSentence = 'گروه جوانان خلاق';
var reg = /(?<!\p{L}\p{M}*)\p{L}\p{M}*/gu;
console.log(englishSentence.match(reg));
console.log(persianSentence.match(reg));

Details

  • (?<!\p{L}\p{M}*) - a negative lookbehind that fails the match if there is a Unicode letter followed with 0+ diacritics
  • \p{L}\p{M}* - a Unicode letter followed with 0+ diacritics
  • gu - g - global, search for all matches, u - make the pattern Unicode aware.

If you need the same functionality in older/other browsers, use XRegExp:

function getFirstLetters(s, regex) {
  var results=[], match;
  XRegExp.forEach(s, regex, function (match, i) {
    results.push(match[1]);
  });
  return results;
}
var rx = XRegExp("(?:^|[^\\pL\\pM])(\\pL\\pM*)", "gu");
console.log(getFirstLetters("Hello world", rx));
console.log(getFirstLetters('گروه جوانان خلاق', rx));
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.2.0/xregexp-all.js"></script>

Details

  • (?:^|[^\\pL\\pM]) - a non-capturing group that matches the start of the string (^) or any char other than a Unicode letter or diacritic
  • (\\pL\\pM*) - Group 1: any Unicode letter followed with 0+ diacritics.

Here, we need to extract Group 1 value, hence .push(match[1]) upon each match.

like image 84
Wiktor Stribiżew Avatar answered May 20 '23 15:05

Wiktor Stribiżew