The target structure looks like the following:
検索結果:100,000件
If I use the following regex pattern:
((?<!検索結果:)(?<!次の)(((〇|一|二|三|四|五|六|七|八|九|十|百|千|万|億|兆|京+|[0-90-9]))(,|,|、)?).+((〇|一|二|三|四|五|六|七|八|九|十|百|千|万|億|兆|京|[0-90-9]).+)件)(?!表示)
As you can see, I want to unmatch everything preceded by "検索結果:" & "次の" using this pattern followed by either Arabic numerals or Japanese kanji (Chinese character) numbers. However, the pattern somehow matches up to 4 digits but not 6 digits.
In other words,
次の1000件
works (meaning it doesn't match anything), but
次の5,0000件
gives a partial match ("0000件")
I want to know why up to 4 digits. And ultimately want to find a way to NOT match anything using this regex. I know this regex is a bit messy. Thanks in advance for your feedback!
In negative lookbehind the regex engine first finds a match for an item after that it traces back and tries to match a given item which is just before the main match. In case of a successful traceback match the match is a failure, otherwise it is a success.
Introduction to the JavaScript regex lookbehind In regular expressions, a lookbehind matches an element if there is another specific element before it. A lookbehind has the following syntax: (?<=Y)X. In this syntax, the pattern match X if there is Y before it.
You need to avoid matching the numbers after a digit or digit + the separator, so you need to add (?<![0-90-9])(?<![0-90-9][,,、])
right after (?<!次の)
:
(?<!検索結果:)(?<!次の)(?<![0-90-9])(?<![0-90-9][,,、])(?:[〇一二三四五六七八九十百千万億兆0-90-9]|京+)[,,、]?.+[〇一二三四五六七八九十百千万億兆京0-90-9].+件
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With