Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex pattern with subpattern exceptions (Python)

I am using BeautifulSoup to extract tabledata tags from a table. The TD's have a class of either 'a','u','e','available-unavailable' or 'unavailable-available'. (Yes, I know quirky class names but hey...)

Here's an example:

<tr>
  <td class="u">4</td>
  <td class="unavailable-available">5</td>
  <td class="a'>6</td>
  <td class="available-unavailable">7</td>
  <td class="u">8</td>
  ...

I've been working with a line which incorporates an re.compile():

  tab = [int(tag.string) for tag in soup.find('table',{'summary':tableSummary}).findAll("td", attrs = {"class": re.compile('\Aa')})]

I need to extract all the td's with a class name of 'a' and 'unavailable-available'. I have been trying some negative-lookahead assertions but without much luck. I would value any regex legends who can produce the correct regex...

like image 718
timbo Avatar asked Jun 16 '26 14:06

timbo


1 Answers

table.findAll('td', attrs = {"class":re.compile(r'(^|\s)(a|unavailable-available)($|\s)')})

This matches start of string or whitespace followed by "a" or "unavailable-available" followed by whitespace or end of string. So it'll match all these sorts of things

class="a"
class="a ui-xxx"
class="ui-xxx a"
class="ui-xxx a ui-yyy"
class="unavailable-available"
class="unavailable-available foo"
like image 82
Chris Morgan Avatar answered Jun 18 '26 04:06

Chris Morgan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!