Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Regex "object has no attribute"

Tags:

python

regex

I've been putting together a list of pages that we need to update with new content (we're switching media formats). In the process I'm cataloging pages that correctly have the new content.

Here's the general idea of what I'm doing:

  1. Iterate through a file structure and get a list of files
  2. For each file read to a buffer and, using regex search, match a specific tag
  3. If matched, test 2 more regex matches
  4. write the resulting matches (one or the other) into a database

Everything works fine up until the 3rd regex pattern match, where I get the following:

'NoneType' object has no attribute 'group'

# only interested in embeded content
pattern = "(<embed .*?</embed>)"

# matches content pointing to our old root
pattern2 = 'data="(http://.*?/media/.*?")'

# matches content pointing to our new root
pattern3 = 'data="(http://.*?/content/.*?")'

matches = re.findall(pattern, filebuffer)
for match in matches:
    if len(match) > 0:

    urla = re.search(pattern2, match)
    if urla.group(1) is not None:
        print filename, urla.group(1)

    urlb = re.search(pattern3, match)
    if urlb.group(1) is not None:
        print filename, urlb.group(1)

thank you.

like image 435
ives Avatar asked Sep 29 '09 08:09

ives


2 Answers

Your exception means that urla has a value of None. Since urla's value is determined by the re.search call, it follows that re.search returns None. And this happens when the string doesn't match the pattern.

So basically you should use:

urla = re.search(pattern2, match)
if urla is not None:
    print filename, urla.group(1)

instead of what you have now.

like image 161
oggy Avatar answered Sep 30 '22 21:09

oggy


The reason for TypeError is that search or match usually return either a MatchObject or a None. Only one of these has a group method. And it's not a None. So you need to do:

url = re.search(pattern2, match)
if url is not None:
    print(filename, url.group(0))

P.S. PEP-8 suggests using 4 spaces for indentation. It's not just an opinion, it's a good practice. Your code is fairly hard to read.

like image 40
SilentGhost Avatar answered Sep 30 '22 19:09

SilentGhost