How can I search for, say, a sequence of 10 isprint
characters in a given string in Python?
With GNU grep, I would simply do grep [[:print:]]{10}
In Python, regex character classes are sets of characters or ranges of characters enclosed by square brackets []. For example, [a-z] it means match any lowercase letter from a to z. Let’s see some of the most common character classes used inside regular expression patterns. Match letter a or b or c followed by either p or q.
The POSIX standard defines 12 character classes. The table below lists all 12, plus the [:ascii:] and [:word:] classes that some regex flavors also support. The table also shows equivalent character classes that you can use in ASCII and Unicode regular expressions if the POSIX classes are unavailable.
A "character class", or a "character set", is a set of characters put in square brackets. The regex engine matches only one out of several characters in the character class or character set. We place the characters we want to match between square brackets. If you want to match any vowel, we use the character set [aeiou].
So in POSIX, the regular expression [d] matches a or a d. To match a ], put it as the first character after the opening [ or the negating ^. To match a -, put it right before the closing ]. To match a ^, put it before the final literal - or the closing ].
Since POSIX is not supported by Python re
module, you have to emulate it with the help of character class.
You can use the one from the regular-expressions.info and add a limiting quantifier {10}
:
[\x20-\x7E]{10}
See demo
Alternatively, you can use Matthew Barnett regex module that claims to support POSIX character classes (POSIX character classes are supported.).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With