Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Why this regex does not work with Eastern Arabic numerals?

@thg435 wrote this answer to a javascript question:

> a = "foo 1234567890 bbb 123456"
"foo 1234567890 bbb 123456"
> a.replace(/\d(?=\d\d(\d{3})*\b)/g, "[$&]")
"foo 1[2]34[5]67[8]90 bbb [1]23[4]56"

It works well with Hindu-Arabic numerals; i.e. 1,2,3,4,... . But when I try to apply the regex to Eastern Arabic numerals, it fails. Here is the regex I use (I've just replaced \d with [\u0660-\u0669] ):


It actually works if my string is ١٢٣٤foo, but fails when it's ١٢٣٤ foo or even foo١٢٣٤:

> a = "١٢٣٤foo  ١٢٣٤ foo  foo١٢٣٤"
"١٢٣٤foo  ١٢٣٤ foo  foo١٢٣٤"
> a.replace(/[\u0660-\u0669](?=[\u0660-\u0669][\u0660-\u0669]([\u0660-\u0669]{3})*\b)/g, "[$&]")
"١[٢]٣٤foo  ١٢٣٤ foo  foo١٢٣٤"

What actually matters to me are separated numbers (e.g. ١٢٣٤). Why it cannot match separated numbers?


Another requirement is that the regex should only match numbers with 5 or more digits (e.g. ١٢٣٤٥ and not ١٢٣٤). I initially thought that that's as simple as adding {5,} at the end of the expression, but that doesn't work.

like image 732
Iryn Avatar asked Nov 12 '22 06:11


1 Answers

Oddly, I'm experiencing the opposite behavior from you (the first one doesn't work and the other two do), but how about if you replaced the \b with (?![\u0660-\u0669])? Then it seems to work no matter what's before or after it:


Edit: This seems to work for the new requirement - to only add the brackets if the run of digits is 3 digits long or more:


Incidentally, some Regex processors will treat those digits as a match for \d. Here is that second Regex with \d instead of those character ranges, which should be a little easier to read:

like image 140
JLRishe Avatar answered Nov 14 '22 21:11
