JavaScript regex to get first character of each word in a sentence (Persian, and English sentence)

Question

Suppose I have the following string:

var englishSentence = 'Hellow World';
var persianSentence = 'گروه جوانان خلاق';

For english I use from following regex, but how can I write a regex to support Persian, or mix of them.

  var matches = englishSentence.match(/\b(\w)/g);
  acronym = matches.join('');

Wiktor Stribiżew · Accepted Answer

Root cause

There is no way to match a Unicode word boundary, \b is not Unicode aware even in ECMA 2018.

Solutions

For ECMA2018 compatible browsers (e.g. the latest versions of Chrome as of April 2018) you may use:

var englishSentence = 'Hellow World';
var persianSentence = 'گروه جوانان خلاق';
var reg = /(?<!\p{L}\p{M}*)\p{L}\p{M}*/gu;
console.log(englishSentence.match(reg));
console.log(persianSentence.match(reg));

Details

(?<!\p{L}\p{M}*) - a negative lookbehind that fails the match if there is a Unicode letter followed with 0+ diacritics
\p{L}\p{M}* - a Unicode letter followed with 0+ diacritics
gu - g - global, search for all matches, u - make the pattern Unicode aware.

If you need the same functionality in older/other browsers, use XRegExp:

function getFirstLetters(s, regex) {
  var results=[], match;
  XRegExp.forEach(s, regex, function (match, i) {
    results.push(match[1]);
  });
  return results;
}
var rx = XRegExp("(?:^|[^\pL\pM])(\pL\pM*)", "gu");
console.log(getFirstLetters("Hello world", rx));
console.log(getFirstLetters('گروه جوانان خلاق', rx));

<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.2.0/xregexp-all.js"></script>

Details

(?:^|[^\pL\pM]) - a non-capturing group that matches the start of the string (^) or any char other than a Unicode letter or diacritic
(\pL\pM*) - Group 1: any Unicode letter followed with 0+ diacritics.

Here, we need to extract Group 1 value, hence .push(match[1]) upon each match.

JavaScript regex to get first character of each word in a sentence (Persian, and English sentence)

Tags:

javascript

regex

jones

1 Answers

Root cause

Solutions

Wiktor Stribiżew

Recent Activity

Donate For Us

JavaScript regex to get first character of each word in a sentence (Persian, and English sentence)

Tags:

javascript

regex

jones

1 Answers

Root cause

Solutions

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us