I'm trying to create a script to remove all docstrings inside a folder. To do so, I'd like to make a regex as efficient as possible.
I've started with this one:
import re
doc_reg = r'(class|def)(.+)\s+("""[\w\s\(\)\-\,\;\:]+""")'
file_content = '''
"""
Mycopyright (c)
"""
from abc import d
class MyClass(MotherClass):
"""
Some;
Multi-
Line Docstring:
"""
def __init__(self, my_param):
"""Docstring"""
self.my_param = my_param
def test_fctn():
"""
Some Docstring
"""
return True
def test_fctn():
some_string = """
Some Docstring
"""
return some_string
'''
print(re.sub(doc_reg, r'\1\2', file_content))
It works quite well but I'm pretty sure it's possible to make this regex more efficient.
Thanks
There are a few things you can do to make it more efficient and some things you can also do to make it shorter/cleaner.
Original (607 steps)
(class|def)(.+)\s+("""[\w\s\(\)\-\,\;\:]+""")
You don't need to but a backslash before each character in a set. This may also have almost insignificant improvements in performance since sre_parse.py won't call _class_escape on line 554 (I'm using Python 3.8.0 as reference).
(class|def)(.+)\s+("""[\w\s(),;:-]+""")
Use quantifiers for repeated characters (595 steps).
(class|def)(.+)\s+("{3}[\w\s(),;:-]+"{3})
^^^ ^^^
Remove unneeded capture groups (588 steps)
(class|def)(.+)\s+"{3}[\w\s(),;:-]+"{3}
^^ ^^
Anchor when possible (345 steps)
\b(class|def)(.+)\s+"{3}[\w\s(),;:-]+"{3}
^^
Combine groups if possible (337 steps) - replacement now becomes \1
\b(class.+|def.+)\s+"{3}[\w\s(),;:-]+"{3}
^^^^^^^^^^^^^^^
Changing class|def to def|class can also impact performance if you suspect more def than class instances (336 steps)
\b(def.+|class.+)\s+"{3}[\w\s(),;:-]+"{3}
^^^^^^^^^^^^^^^
Generally, it is a very bad idea to parse source code (any, not only Python) using regular expressions. It is very buggy, hard in the future support and doesn't work as expected for any source code block.
Python has a great builtin library named ast which provides Python's internal parser which parses source code into tree structures that you can walk passthrough or modify (what we want).
Ok, here is a working example with a bit modified source code for analysis (added function in function to make the example harder =).
clean.py
import ast
import astor # read more at https://astor.readthedocs.io/en/latest/
parsed = ast.parse(open('source.py').read())
for node in ast.walk(parsed):
# let's work only on functions & classes definitions
if not isinstance(node, (ast.FunctionDef, ast.ClassDef, ast.AsyncFunctionDef)):
continue
if not len(node.body):
continue
if not isinstance(node.body[0], ast.Expr):
continue
if not hasattr(node.body[0], 'value') or not isinstance(node.body[0].value, ast.Str):
continue
# Uncomment lines below if you want print what and where we are removing
# print(node)
# print(node.body[0].value.s)
node.body = node.body[1:]
print('***** Processed source code output ******\n=========================================')
print(astor.to_source(parsed))
source.py
"""
Mycopyright (c)
"""
from abc import d
class MyClass(MotherClass):
"""
Some;
Multi-
Line Docstring:
"""
def __init__(self, my_param):
"""Docstring"""
self.my_param = my_param
def test_fctn():
"""
Some Docstring
"""
def _wrapped(omg):
"some extra docstring"
pass
return True
def test_fctn():
some_string = """
Some Docstring
"""
return some_string
and console output, I've just printed it to make it easy for the case.
console.log
python clean.py
***** Processed source code output ******
=========================================
"""
Mycopyright (c)
"""
from abc import d
class MyClass(MotherClass):
def __init__(self, my_param):
self.my_param = my_param
def test_fctn():
def _wrapped(omg):
pass
return True
def test_fctn():
some_string = """
Some Docstring
"""
return some_string
I've used standard builtin library AST - Abstract Syntax Trees for parsing the source code and astor – AST observe/rewrite to build it back into python executable source code
all in one GitHub Gist https://gist.github.com/phpdude/1ae6f19de213d66286c8183e9e3b9ec1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With