Trying NOT to match a Japanese word using RegEx negative lookbehind

Tags:

The target structure looks like the following:

検索結果：１００，０００件

If I use the following regex pattern:

((?<!検索結果：)(?<!次の)(((〇|一|二|三|四|五|六|七|八|九|十|百|千|万|億|兆|京+|[0-9０-９]))(,|，|、)?).+((〇|一|二|三|四|五|六|七|八|九|十|百|千|万|億|兆|京|[0-9０-９]).+)件)(?!表示)

As you can see, I want to unmatch everything preceded by "検索結果：" & "次の" using this pattern followed by either Arabic numerals or Japanese kanji (Chinese character) numbers. However, the pattern somehow matches up to 4 digits but not 6 digits.

In other words,

次の１０００件

works (meaning it doesn't match anything), but

次の５，００００件

gives a partial match ("００００件")

I want to know why up to 4 digits. And ultimately want to find a way to NOT match anything using this regex. I know this regex is a bit messy. Thanks in advance for your feedback!

366

asked Jan 15 '19 07:01

Michael

1 Answers

You need to avoid matching the numbers after a digit or digit + the separator, so you need to add (?<![０-９0-9])(?<![０-９0-9][，,、]) right after (?<!次の):

(?<!検索結果：)(?<!次の)(?<![０-９0-9])(?<![０-９0-9][，,、])(?:[〇一二三四五六七八九十百千万億兆0-9０-９]|京+)[,，、]?.+[〇一二三四五六七八九十百千万億兆京0-9０-９].+件
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

See the regex demo.

answered Oct 24 '22 00:10

Wiktor Stribiżew

Related questions
                            
                                How to delete specific html class with content using Java Html Class
                            
                                form validation using ajax with regex
                            
                                How can I make ruby ShellWords.shellescape work with multibyte characters?
                            
                                Combinatorial product of regex substitutions
                            
                                sed - replace string with equal number of characters
                            
                                Get title and year from file name using regex
                            
                                SQL Regex - Replace with substring from another field
                            
                                How to transpose music chords with PHP?
                            
                                Why use ^[\s\u200c]+|[\s\u200c]+$ to trim spaces? [closed]
                            
                                How to reuse the number of group matches with regex?
                            
                                How can I detect how many capture groups are in a Perl Regexp?
                            
                                How to split text to match double quotes plus trailing text to dot?
                            
                                Handling conjunctions when splitting sentences using core-nlp's DocumentPreprocessor
                            
                                Regex for replacing first 5 numbers, irrespective of anything between them?
                            
                                Lazy quantifier and lookahead
                            
                                How do I ignore JAVA tests in Coverity Connect analysis result?
                            
                                Regular expression for word boundaries but including emojis [duplicate]
                            
                                Why does a lookahead in an optional 0-width capture group prevent the group from matching?
                            
                                RegEx match for paragraphs
                            
                                Regex and proper capture using .matches .Concat in C#

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Trying NOT to match a Japanese word using RegEx negative lookbehind

Tags:

regex

regex-negation

cjk

Michael

People also ask

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us