Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get a list of matchable characters from a regex class

Given a regex character class/set, how can i get a list of all matchable characters (in python 3). E.g.:

[\dA-C]

should give

['0','1','2','3','4','5','6','7','8','9','A','B','C']
like image 881
o17t H1H' S'k Avatar asked Oct 17 '16 19:10

o17t H1H' S'k


People also ask

How do I find special characters in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

How can I find all matches to a regular expression in Python?

findall(pattern, string) returns a list of matching strings. re. finditer(pattern, string) returns an iterator over MatchObject objects.

What is '?' In regex?

The '?' means match zero or one space. This will match "Kaleidoscope", as well as all the misspellings that are common, the [] meaning match any of the alternatives within the square brackets.

How do you match a character except in regex?

To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '.


2 Answers

I think what you are looking for is string.printable which returns all the printable characters in Python. For example:

>>> import string
>>> string.printable
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

Now to check content satisfied by your regex, you may do:

>>> import re
>>> x = string.printable
>>> pattern = r'[\dA-C]'
>>> print(re.findall(pattern, x))
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C']

string.printable is a combination of digits, letters, punctuation, and whitespace. Also check String Constants for complete list of constants available with string module.


In case you need the list of all unicode characters, you may do:

import sys
unicode_list = [chr(i) for i in range(sys.maxunicode)]

Note: It will be a huge list, and console might get stuck for a while to give the result as value of sys.maxunicode is:

>>> sys.maxunicode
1114111

In case you are dealing with some specific unicode formats, refer Unicode Character Ranges for limiting the ranges you are interested in.

like image 151
Moinuddin Quadri Avatar answered Nov 02 '22 23:11

Moinuddin Quadri


import re

x = '123456789ABCDE'
pattern = r'[\dA-C]'
print(re.findall(pattern,x))    
#prints ['1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C']

Is this what you are looking for?

If you don't have x and just want to match ascii characters you can use :

import re
import string

x = string.ascii_uppercase + string.digits
pattern = r'[\dA-C]'
print(re.findall(pattern,x))    

If you want to take inputs for the pattern you can simply just do:

 pattern = input() #with either one from above
like image 20
MooingRawr Avatar answered Nov 02 '22 23:11

MooingRawr