I want to get JS file names from the input content which contains jquery
as a substring by RE.
This is my code:
Step 1: Extract JS file from the content.
>>> data = """ <script type="text/javascript" src="js/jquery-1.9.1.min.js"/>
... <script type="text/javascript" src="js/jquery-migrate-1.2.1.min.js"/>
... <script type="text/javascript" src="js/jquery-ui.min.js"/>
... <script type="text/javascript" src="js/abc_bsub.js"/>
... <script type="text/javascript" src="js/abc_core.js"/>
... <script type="text/javascript" src="js/abc_explore.js"/>
... <script type="text/javascript" src="js/abc_qaa.js"/>"""
>>> import re
>>> re.findall('src="js/([^"]+)"', data)
['jquery-1.9.1.min.js', 'jquery-migrate-1.2.1.min.js', 'jquery-ui.min.js', 'abc_bsub.js', 'abc_core.js', 'abc_explore.js', 'abc_qaa.js']
Step 2: Get JS file which have sub string as jquery
>>> [ii for ii in re.findall('src="js/([^"]+)"', data) if "jquery" in ii]
['jquery-1.9.1.min.js', 'jquery-migrate-1.2.1.min.js', 'jquery-ui.min.js']
Can I do above Step 2 in the Step 1 means RE Pattern to get result?
Sure you can. One way would be to use
re.findall('src="js/([^"]*jquery[^"]*)"', data)
This will match everything after "js/
until the nearest "
if it contains jquery
anywhere. If you know more about the position of jquery
(for example, if it's always at the start) you can adjust the regex accordingly.
If you want to make sure that jquery
is not directly surrounded by other alphanumeric characters, use word boundary anchors:
re.findall(r'src="js/([^"]*\bjquery\b[^"]*)"', data)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With