Given a regex character class/set, how can i get a list of all matchable characters (in python 3). E.g.:
[\dA-C]
should give
['0','1','2','3','4','5','6','7','8','9','A','B','C']
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
findall(pattern, string) returns a list of matching strings. re. finditer(pattern, string) returns an iterator over MatchObject objects.
The '?' means match zero or one space. This will match "Kaleidoscope", as well as all the misspellings that are common, the [] meaning match any of the alternatives within the square brackets.
To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '.
I think what you are looking for is string.printable
which returns all the printable characters in Python. For example:
>>> import string
>>> string.printable
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
Now to check content satisfied by your regex, you may do:
>>> import re
>>> x = string.printable
>>> pattern = r'[\dA-C]'
>>> print(re.findall(pattern, x))
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C']
string.printable
is a combination of digits, letters, punctuation, and whitespace. Also check String Constants for complete list of constants available with string module.
In case you need the list of all unicode
characters, you may do:
import sys
unicode_list = [chr(i) for i in range(sys.maxunicode)]
Note: It will be a huge list, and console might get stuck for a while to give the result as value of sys.maxunicode
is:
>>> sys.maxunicode
1114111
In case you are dealing with some specific unicode formats, refer Unicode Character Ranges for limiting the ranges you are interested in.
import re
x = '123456789ABCDE'
pattern = r'[\dA-C]'
print(re.findall(pattern,x))
#prints ['1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C']
Is this what you are looking for?
If you don't have x
and just want to match ascii characters you can use :
import re
import string
x = string.ascii_uppercase + string.digits
pattern = r'[\dA-C]'
print(re.findall(pattern,x))
If you want to take inputs for the pattern you can simply just do:
pattern = input() #with either one from above
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With