Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why this regex does not work with Eastern Arabic numerals?

@thg435 wrote this answer to a javascript question:

> a = "foo 1234567890 bbb 123456"
"foo 1234567890 bbb 123456"
> a.replace(/\d(?=\d\d(\d{3})*\b)/g, "[$&]")
"foo 1[2]34[5]67[8]90 bbb [1]23[4]56"

It works well with Hindu-Arabic numerals; i.e. 1,2,3,4,... . But when I try to apply the regex to Eastern Arabic numerals, it fails. Here is the regex I use (I've just replaced \d with [\u0660-\u0669] ):

/[\u0660-\u0669](?=[\u0660-\u0669][\u0660-\u0669]([\u0660-\u0669]{3})*\b)/g

It actually works if my string is ١٢٣٤foo, but fails when it's ١٢٣٤ foo or even foo١٢٣٤:

> a = "١٢٣٤foo  ١٢٣٤ foo  foo١٢٣٤"
"١٢٣٤foo  ١٢٣٤ foo  foo١٢٣٤"
> a.replace(/[\u0660-\u0669](?=[\u0660-\u0669][\u0660-\u0669]([\u0660-\u0669]{3})*\b)/g, "[$&]")
"١[٢]٣٤foo  ١٢٣٤ foo  foo١٢٣٤"

What actually matters to me are separated numbers (e.g. ١٢٣٤). Why it cannot match separated numbers?

Update:

Another requirement is that the regex should only match numbers with 5 or more digits (e.g. ١٢٣٤٥ and not ١٢٣٤). I initially thought that that's as simple as adding {5,} at the end of the expression, but that doesn't work.

like image 732
Iryn Avatar asked Nov 12 '22 06:11

Iryn


1 Answers

Oddly, I'm experiencing the opposite behavior from you (the first one doesn't work and the other two do), but how about if you replaced the \b with (?![\u0660-\u0669])? Then it seems to work no matter what's before or after it:

[\u0660-\u0669](?=[\u0660-\u0669][\u0660-\u0669]([\u0660-\u0669]{3})*(?![\u0660-\u0669]))

Edit: This seems to work for the new requirement - to only add the brackets if the run of digits is 3 digits long or more:

[\u0660-\u0669](?=[\u0660-\u0669]{2}([\u0660-\u0669]{3})+(?![\u0660-\u0669]))|(?<=[\u0660-\u0669]{2})[\u0660-\u0669](?=[\u0660-\u0669]{2}(?![\u0660-\u0669]))

Incidentally, some Regex processors will treat those digits as a match for \d. Here is that second Regex with \d instead of those character ranges, which should be a little easier to read:

\d(?=\d{2}(\d{3})+(?!\d))|(?<=\d{2})\d(?=\d{2}(?!\d))
like image 140
JLRishe Avatar answered Nov 14 '22 21:11

JLRishe