So I was trying to match each line of a file to a regex and I did the following:
import re
regex='\S+\s+(\S{6})\s+VAR'
with open('/home/jyt109/humsavar.txt') as humsavar:
for line in humsavar:
match=regex.search(line)
print match.group(1)
Expected output is the particular 6 characters that are in each line, instead I get an error as below:
Traceback (most recent call last):
File "exercise.py", line 74, in <module>
match=regex.search(line)
AttributeError: 'str' object has no attribute 'search'
I have found out (from link below) that to match a regex to each line of a file, the file has to be first turned into a list by file.read()
Match multiline regex in file object
To readdress the post, is there any simpler way to do it (preferably over 1 line instead of 2)?
humsavar=open('/home/jyt109/humsavar.txt')
text=humsavar.read()
Thanks!
I think you may have misunderstood what that link was saying. If matches of your regex can span multiple lines, then you need to read the file using file.read(). If newlines will never be a part of a match, then you can read the file line by line and try to match each line separately.
If you want to check each line separately, you can use file.readlines() to get a list of lines or just iterate over the file object, for example:
with open('/home/jyt109/humsavar.txt') as f:
for line in f:
match = regex.search(line)
Assuming you do still want to read the entire file contents at once, you do that on one line like this:
text = open('/home/jyt109/humsavar.txt').read()
.read() does not turn a file into a list (.readlines() does); instead it puts the entire file into a string.
But even then you can use a regex: when compiling it with re.MULTILINE, the anchors ^ and $ will match the starts and ends of individual lines:
>>> regex = re.compile(r"^Match this regex in each line$", re.MULTILINE)
>>> regex.findall(text)
The result will be a list of all matches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With