Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract title tag with BeautifulSoup

I have this:

date = chunk.find_all('a', title=True, class_='tweet-timestamp js-permalink     js-nav js-tooltip')

Which returns this:

<a class="tweet-timestamp js-permalink js-nav js-tooltip" href="/15colleen/status/537395294133313536" title="3:59 PM - 25 Nov 2014"><span class="_timestamp js-short-timestamp " data-aria-label-part="last" data-long-form="true" data-time="1416959997" data-time-ms="1416959997000">Nov 25</span></a>

Obviously get_text()returns Nov 25, but I want to extract the snippet 3:59 PM - 25 Nov 2014.

like image 540
DIGSUM Avatar asked Mar 05 '15 10:03

DIGSUM


People also ask

How do you scrape a tag with BeautifulSoup?

Step 1: The first step will be for scraping we need to import beautifulsoup module and get the request of the website we need to import the requests module. Step 2: The second step will be to request the URL call get method.


2 Answers

You just need .find and to extract ["title"]

from bs4 import BeautifulSoup

soup = BeautifulSoup(html)
print(soup.find("a",attrs={"class":"tweet-timestamp js-permalink js-nav js-tooltip"})["title"])

3:59 PM - 25 Nov 2014
like image 63
Padraic Cunningham Avatar answered Oct 14 '22 22:10

Padraic Cunningham


Specify the list index along with the title index to get the value of title attribute.

>>> from bs4 import BeautifulSoup
>>> s = '<a class="tweet-timestamp js-permalink js-nav js-tooltip" href="/15colleen/status/537395294133313536" title="3:59 PM - 25 Nov 2014"><span class="_timestamp js-short-timestamp " data-aria-label-part="last" data-long-form="true" data-time="1416959997" data-time-ms="1416959997000">Nov 25</span></a>'
>>> soup = BeautifulSoup(s)
>>> date = soup.find_all('a', title=True, class_='tweet-timestamp js-permalink     js-nav js-tooltip')
>>> date
[<a class="tweet-timestamp js-permalink js-nav js-tooltip" href="/15colleen/status/537395294133313536" title="3:59 PM - 25 Nov 2014"><span class="_timestamp js-short-timestamp " data-aria-label-part="last" data-long-form="true" data-time="1416959997" data-time-ms="1416959997000">Nov 25</span></a>]
>>> date[0]['title']
'3:59 PM - 25 Nov 2014'
like image 37
Avinash Raj Avatar answered Oct 15 '22 00:10

Avinash Raj