Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python regular expression to match js or php url

Tags:

python

regex

I tried to match js and php url with python re but expression below doesn't work, anyone can help me?

import re, urllib2
response = urllib2.urlopen('https://www.cnn.com')
s = response.read()
p = re.compile(r'^(http|https|//).+?\.(js|php)$')
m = p.findall(s)

for i in m:
    print i

Also, some Web pages use //, not http or https. Is there any way to match those, too?

like image 876
Jerry Avatar asked Dec 02 '25 10:12

Jerry


1 Answers

You seem to want to match URLs that end with file extensions js and php, that may start with http, https or //.

Use

import re
s = "https://www.cnn.com/1.js!! http://www.cnn.com/2.php; //some.site.com/3.js,"
res = re.findall(r'(?:\bhttps?:)?//\S*\.(?:js|php)\b', s)
print(res)

See the Python demo

Details:

  • (?:\bhttps?:)? - an optional sequence of
    • \b - a leading word boundary
    • https?: - http, 1 or 0 (=optional) s, and a :
  • // - a literal char sequence //
  • \S* - zero or more non-whitespace symbols
  • \. - a dot
  • (?:js|php) - js or php literal char sequences
  • \b - a trailing word boundary
like image 142
Wiktor Stribiżew Avatar answered Dec 05 '25 00:12

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!