Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regular expression for a comment in a long string

Tags:

python

regex

I am trying to work out a good regular expression for a python comment(s) that is located within a long string. So far I have

regex:

#(.?|\n)*

string:

'### this is a comment\na = \'a string\'.toupper()\nprint a\n\na_var_name = " ${an.injection} "\nanother_var = " ${bn.injection} "\ndtabse_conn = " ${cn.injection} "\n\ndef do_something()\n    # this call outputs an xml stream of the current parameter dictionary.\n    paramtertools.print_header(params)\n\nfor i in xrange(256):    # wow another comment\n    print i**2\n\n'

I feel like there is a much better way to get all of the individual comments from the string, but I am not an expert in regular expressions. Does anyone have a better solution?

like image 536
baallezx Avatar asked May 29 '26 00:05

baallezx


2 Answers

Get the comments from matched group at index 1.

(#+[^\\\n]*)

DEMO

Sample code:

import re
p = re.compile(ur'(#+[^\\\n]*)')
test_str = u"..."

re.findall(p, test_str)

Matches:

1.  ### this is a comment
2.  # this call outputs an xml stream of the current parameter dictionary.
3.  # wow another comment
like image 57
Braj Avatar answered May 31 '26 13:05

Braj


Since this is a python code in the string, I'd use tokenize module to parse it and extract comments:

import tokenize
import StringIO

text = '### this is a comment\na = \'a string\'.toupper()\nprint a\n\na_var_name = " ${an.injection} "\nanother_var = " ${bn.injection} "\ndtabse_conn = " ${cn.injection} "\n\ndef do_something():\n    # this call outputs an xml stream of the current parameter dictionary.\n    paramtertools.print_header(params)\n\nfor i in xrange(256):    # wow another comment\n    print i**2\n\n'

tokens = tokenize.generate_tokens(StringIO.StringIO(text).readline)
for toktype, ttext, (slineno, scol), (elineno, ecol), ltext in tokens:
    if toktype == tokenize.COMMENT:
        print ttext

Prints:

### this is a comment
# this call outputs an xml stream of the current parameter dictionary.
# wow another comment

Note that the code in the string has a syntax error: missing : after the do_something() function definition.

Also, note that ast module would not help here, since it doesn't preserve comments.

like image 20
alecxe Avatar answered May 31 '26 13:05

alecxe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!