I am trying to scrape a website and find all the headings of a feed. I am having trouble just getting the text of the a
tag that I need. Here is an example of the html.
<td class="m" id="b1"><a href="/QSYcfT" id="c1" target="_blank" onClick="vPI('https://www.youtube.com/watch?v=BFNH-6K10Ic', 'QSYcfT', this.id); this.blur(); return false;">TF4 - Oreos</a> <a href="#" onClick="return lkP('1', 'QSYcfT');" id="x1"><font class="bp">(0)</font></a>
<td class="m" id="b2"><a href="/zXHNvp" id="c2" target="_blank" onClick="vPI('https://www.youtube.com/watch?v=0vjcGwZGBYI', 'zXHNvp', this.id); this.blur(); return false;">Awesome Game Boy Facts</a> <a href="#" onClick="return lkP('2', 'zXHNvp');" id="x2"><font class="bp">(0)</font></a>
I am trying to get the text for every a
tag with a id of c
and print each on a new line.
My output should look like this.
TF4 - Oreos
Awesome Game Boy Facts
So far I have tried.
soup = bs4.BeautifulSoup(html)
links = soup.find_all('a',{'id' : 'c'})
for link in links:
print link.text
But it doesn't find or print anything?
You can pass a regular expression in place of an attribute value:
links = soup.find_all('a', {'id': re.compile('^c\d+')})
^
means the beginning of a string, \d+
matches one or more digits.
Demo:
>>> import re
>>> from bs4 import BeautifulSoup
>>>
>>> html = """
... <tr>
... <td class="m" id="b1"><a href="/QSYcfT" id="c1" target="_blank" onClick="vPI('https://www.youtube.com/watch?v=BFNH-6K10Ic', 'QSYcfT', this.id); this.blur(); return false;">TF4 - Oreos</a> <a href="#" onClick="return lkP('1', 'QSYcfT');" id="x1"><font class="bp">(0)</font></a></td>
... <td class="m" id="b2"><a href="/zXHNvp" id="c2" target="_blank" onClick="vPI('https://www.youtube.com/watch?v=0vjcGwZGBYI', 'zXHNvp', this.id); this.blur(); return false;">Awesome Game Boy Facts</a> <a href="#" onClick="return lkP('2', 'zXHNvp');" id="x2"><font class="bp">(0)</font></a></td>
... </tr>
... """
>>> soup = BeautifulSoup(html)
>>> links = soup.find_all('a', {'id': re.compile('^c\d+')})
>>> for link in links:
... print link.text
...
TF4 - Oreos
Awesome Game Boy Facts
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With