Extracting specific src attributes from script tags

Question

I want to get JS file names from the input content which contains jquery as a substring by RE.

This is my code:

Step 1: Extract JS file from the content.

>>> data = """    <script type="text/javascript" src="js/jquery-1.9.1.min.js"/>
...     <script type="text/javascript" src="js/jquery-migrate-1.2.1.min.js"/>
...     <script type="text/javascript" src="js/jquery-ui.min.js"/>
...     <script type="text/javascript" src="js/abc_bsub.js"/>
...     <script type="text/javascript" src="js/abc_core.js"/>
...     <script type="text/javascript" src="js/abc_explore.js"/>
...     <script type="text/javascript" src="js/abc_qaa.js"/>"""
>>> import re
>>> re.findall('src="js/([^"]+)"', data)
['jquery-1.9.1.min.js', 'jquery-migrate-1.2.1.min.js', 'jquery-ui.min.js', 'abc_bsub.js', 'abc_core.js', 'abc_explore.js', 'abc_qaa.js']

Step 2: Get JS file which have sub string as jquery

>>> [ii for ii in re.findall('src="js/([^"]+)"', data) if "jquery" in ii]
['jquery-1.9.1.min.js', 'jquery-migrate-1.2.1.min.js', 'jquery-ui.min.js']

Can I do above Step 2 in the Step 1 means RE Pattern to get result?

Tim Pietzcker · Accepted Answer

Sure you can. One way would be to use

re.findall('src="js/([^"]*jquery[^"]*)"', data)

This will match everything after "js/ until the nearest " if it contains jquery anywhere. If you know more about the position of jquery (for example, if it's always at the start) you can adjust the regex accordingly.

If you want to make sure that jquery is not directly surrounded by other alphanumeric characters, use word boundary anchors:

re.findall(r'src="js/([^"]*\bjquery\b[^"]*)"', data)

Extracting specific src attributes from script tags

Tags:

python

regex

Vivek Sable

1 Answers

Tim Pietzcker

Recent Activity

Donate For Us

Extracting specific src attributes from script tags

Tags:

python

regex

Vivek Sable

1 Answers

Tim Pietzcker

Related questions

Recent Activity

Donate For Us