Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex for sequence containing at least two digits/letters

using the Python module re, I would like to detect sequences that contain at least two letters (A-Z) and at least two digits (0-9) from a text, e.g., from the text

"N03FZ467 other text N03671"

precisely the sub-string "N03FZ467" shall be matched.

The best I have got so far is

(?=[A-Z]*\d)[A-Z0-9]{4,}

which detects sequences of length at least 4 that contain only letters A-Z and digits 0-9, and at least one digit and one letter. How can I make sure I respectively get at least two?

like image 503
fnjo Avatar asked Jun 23 '26 23:06

fnjo


1 Answers

  1. If you want to match full words, start matching at word boundaries \b.
  2. Check the first condition (two upper) by a lookahead: (?=(?:\d*[A-Z]){2})
  3. If this succeeds, match the second requirement, two digits: (?:[A-Z]*\d){2}
  4. Finally match any remaining [A-Z\d]* until another \b.

Putting it together:

\b(?=(?:\d*[A-Z]){2})(?:[A-Z]*\d){2}[A-Z\d]*\b

See this demo at regex101 or a Python demo at tio.run

Note that a lookahead is a zero length assertion, it does not consume characters. If you don't specifiy a starting point eg \b, the lookahead will be used at any place which is less efficient.
Further to mention, the minimum length of at least four will be satisfied by the requirements.

like image 188
bobble bubble Avatar answered Jun 26 '26 19:06

bobble bubble



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!