Is there a cleaner way to write long regex patterns in python? I saw this approach somewhere but regex in python doesn't allow lists. <pre class="prettyprint"><code>patterns = [ re.compile(r'<!--([^->]|(-+[^->])|(-?>))*-{2,}>'), re.compile(r'\n+|\s{2}') ] </code></pre>

You can use verbose mode to write more readable regular expressions. In this mode: <ul> <li>Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash.</li> <li>When a line contains a '#' neither in a character class or preceded by an unescaped backslash, all characters from the leftmost such '#' through the end of the line are ignored.</li> </ul> The following two statements are equivalent: <pre class="prettyprint"><code>a = re.compile(r"""\d + # the integral part \. # the decimal point \d * # some fractional digits""", re.X) b = re.compile(r"\d+\.\d*") </code></pre> (Taken from the documentation of verbose mode)

Though @Ayman's suggestion about <code>re.VERBOSE</code> is a better idea, if all you want is what you're showing, just do: <pre class="prettyprint"><code>patterns = re.compile( r'<!--([^->]|(-+[^->])|(-?>))*-{2,}>' r'\n+|\s{2}' ) </code></pre> and Python's automatic concatenation of adjacent string literals (much like C's, btw) will do the rest;-).

You can use comments in regex's, which make them much more readable. Taking an example from http://gnosis.cx/publish/programming/regular_expressions.html : <pre class="prettyprint"><code>/ # identify URLs within a text file [^="] # do not match URLs in IMG tags like: # <img src="http://mysite.com/mypic.png"> http|ftp|gopher # make sure we find a resource type :\/\/ # ...needs to be followed by colon-slash-slash [^ \n\r]+ # stuff other than space, newline, tab is in URL (?=[\s\.,]) # assert: followed by whitespace/period/comma / </code></pre>

Clean Python Regular Expressions

Tags:

python

regex

list

Is there a cleaner way to write long regex patterns in python? I saw this approach somewhere but regex in python doesn't allow lists.

Click to copy

patterns = [
    re.compile(r'<!--([^->]|(-+[^->])|(-?>))*-{2,}>'),
    re.compile(r'\n+|\s{2}')
]

586

asked Jun 06 '09 02:06

KeyboardInterrupt

3 Answers

You can use verbose mode to write more readable regular expressions. In this mode:

Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash.
When a line contains a '#' neither in a character class or preceded by an unescaped backslash, all characters from the leftmost such '#' through the end of the line are ignored.

The following two statements are equivalent:

Click to copy

a = re.compile(r"""\d +  # the integral part
                   \.    # the decimal point
                   \d *  # some fractional digits""", re.X)

b = re.compile(r"\d+\.\d*")

(Taken from the documentation of verbose mode)

170

answered Oct 02 '22 19:10

Ayman Hourieh

Though @Ayman's suggestion about re.VERBOSE is a better idea, if all you want is what you're showing, just do:

Click to copy

patterns = re.compile(
        r'<!--([^->]|(-+[^->])|(-?>))*-{2,}>'
        r'\n+|\s{2}'
)

and Python's automatic concatenation of adjacent string literals (much like C's, btw) will do the rest;-).

answered Oct 02 '22 19:10

Alex Martelli

You can use comments in regex's, which make them much more readable. Taking an example from http://gnosis.cx/publish/programming/regular_expressions.html :

Click to copy

/               # identify URLs within a text file
          [^="] # do not match URLs in IMG tags like:
                # <img src="http://mysite.com/mypic.png">
http|ftp|gopher # make sure we find a resource type
          :\/\/ # ...needs to be followed by colon-slash-slash
      [^ \n\r]+ # stuff other than space, newline, tab is in URL
    (?=[\s\.,]) # assert: followed by whitespace/period/comma 
/

answered Oct 02 '22 18:10

Nathaniel Flath

Related questions
                            
                                How to get filter to work with a lambda taking multiple arguments?
                            
                                Append elements of a set to a list in Python
                            
                                Better Python list Naming Other than "list"
                            
                                PIL Convert PNG or GIF with Transparency to JPG without
                            
                                Is there anything to be gained from short variable names?
                            
                                How to repeat try-except block
                            
                                What is the maximum length for an attribute name in python?
                            
                                Creating a Pandas DataFrame with a numpy array containing multiple types
                            
                                Wrapping around on a list when list index is out of range
                            
                                NameError: name 'random' is not defined [closed]
                            
                                How to disable perspective in mplot3d?
                            
                                Setting plot background colour in Seaborn
                            
                                SqlAlchemy converting UTC DateTime to local time before saving
                            
                                How do I get the giant component of a NetworkX graph?
                            
                                Storing logger messages in a string
                            
                                How can i remove extra "s" from django admin panel?
                            
                                from matplotlib.backends import _tkagg ImportError: cannot import name _tkagg
                            
                                Save multiple arrays to a csv file with column names
                            
                                Convert class 'pandas.indexes.numeric.Int64Index' to numpy
                            
                                Pandas read json ValueError: Protocol not known

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Clean Python Regular Expressions

Tags:

python

regex

list

KeyboardInterrupt

People also ask

3 Answers

Ayman Hourieh

Alex Martelli

Nathaniel Flath

Recent Activity

Donate For Us