I have a string. Let's call it 'test'. I want to test a match for this string, but only using the backref of a regex.
Can I do something like this:
import re
for line in f.readlines(): if '<a href' in line: if re.match('<a href="(.*)">', line) == 'test': print 'matched!'
? This of course, doesn't seem to work, but I would think that I might be close? Basically the question is how can I get re to return only the backref for comparison?
To test a regular expression, first search for errors such as non-escaped characters or unbalanced parentheses. Then test it against various input strings to ensure it accepts correct strings and regex wrong ones.
JavaScript RegExp test() The test() method tests for a match in a string. If it finds a match, it returns true, otherwise it returns false.
How to analyze a regular expression? The software of regex analysis decomposes a regular expression in order to find each component (characters, pattern, matches) and describe its meaning. The analysis is based on a cut of the pattern matching expression.
re.match
matches only at the beginning of the string.
def url_match(line, url): match = re.match(r'<a href="(?P<url>[^"]*?)"', line) return match and match.groupdict()['url'] == url:
example usage:
>>> url_match('<a href="test">', 'test') True >>> url_match('<a href="test">', 'te') False >>> url_match('this is a <a href="test">', 'test') False
If the pattern could occur anywhere in the line, use re.search
.
def url_search(line, url): match = re.search(r'<a href="(?P<url>[^"]*?)"', line) return match and match.groupdict()['url'] == url:
example usage:
>>> url_search('<a href="test">', 'test') True >>> url_search('<a href="test">', 'te') False >>> url_search('this is a <a href="test">', 'test') True
N.B : If you are trying to parsing HTML using a regex, read RegEx match open tags except XHTML self-contained tags before going any further.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With