I am running through lines in a text file using a python
script. I want to search for an img
tag within the text document and return the tag as text.
When I run the regex re.match(line)
it returns a _sre.SRE_MATCH
object. How do I get it to return a string?
import sys import string import re f = open("sample.txt", 'r' ) l = open('writetest.txt', 'w') count = 1 for line in f: line = line.rstrip() imgtag = re.match(r'<img.*?>',line) print("yo it's a {}".format(imgtag))
When run it prints:
yo it's a None yo it's a None yo it's a None yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578> yo it's a None yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578> yo it's a None yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578> yo it's a <_sre.SRE_Match object at 0x7fd4ea90e5e0> yo it's a None yo it's a None
The method str. match(regexp) finds matches for regexp in the string str . If the regexp has flag g , then it returns an array of all matches as strings, without capturing groups and other details. If there are no matches, no matter if there's flag g or not, null is returned.
Practical Data Science using Python , '*' or '+' are called repeating character classes. If you repeat a character class by using the '?' , '*' or '+' operators, you will repeat the entire character class, and not just the character that it matched. The regex '[0-9]+' can match '579' as well as '333'.
You should use re.MatchObject.group(0)
. Like
imtag = re.match(r'<img.*?>', line).group(0)
Edit:
You also might be better off doing something like
imgtag = re.match(r'<img.*?>',line) if imtag: print("yo it's a {}".format(imgtag.group(0)))
to eliminate all the None
s.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With