Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python -regex match and for loop that run file line by line

So I was trying to match each line of a file to a regex and I did the following:

import re
regex='\S+\s+(\S{6})\s+VAR'
with open('/home/jyt109/humsavar.txt') as humsavar:
    for line in humsavar:
        match=regex.search(line)
        print match.group(1)

Expected output is the particular 6 characters that are in each line, instead I get an error as below:

Traceback (most recent call last):
  File "exercise.py", line 74, in <module>
    match=regex.search(line)
AttributeError: 'str' object has no attribute 'search'

I have found out (from link below) that to match a regex to each line of a file, the file has to be first turned into a list by file.read()

Match multiline regex in file object

To readdress the post, is there any simpler way to do it (preferably over 1 line instead of 2)?

humsavar=open('/home/jyt109/humsavar.txt')
text=humsavar.read()

Thanks!

like image 477
noqa Avatar asked Oct 29 '25 21:10

noqa


2 Answers

I think you may have misunderstood what that link was saying. If matches of your regex can span multiple lines, then you need to read the file using file.read(). If newlines will never be a part of a match, then you can read the file line by line and try to match each line separately.

If you want to check each line separately, you can use file.readlines() to get a list of lines or just iterate over the file object, for example:

with open('/home/jyt109/humsavar.txt') as f:
    for line in f:
        match = regex.search(line)

Assuming you do still want to read the entire file contents at once, you do that on one line like this:

text = open('/home/jyt109/humsavar.txt').read()
like image 185
Andrew Clark Avatar answered Nov 01 '25 10:11

Andrew Clark


.read() does not turn a file into a list (.readlines() does); instead it puts the entire file into a string.

But even then you can use a regex: when compiling it with re.MULTILINE, the anchors ^ and $ will match the starts and ends of individual lines:

>>> regex = re.compile(r"^Match this regex in each line$", re.MULTILINE)
>>> regex.findall(text)

The result will be a list of all matches.

like image 30
Tim Pietzcker Avatar answered Nov 01 '25 10:11

Tim Pietzcker



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!