I want to extract the title and description from the following website:
view-source:http://www.virginaustralia.com/au/en/bookings/flights/make-a-booking/
with the following snippet of source code:
<title>Book a Virgin Australia Flight | Virgin Australia
</title>
<meta name="keywords" content="" />
<meta name="description" content="Search for and book Virgin Australia and partner flights to Australian and international destinations." />
I want the title and meta content.
I used goose but it does not do a good job extracting. Here is my code:
website_title = [g.extract(url).title for url in clean_url_data]
and
website_meta_description=[g.extract(urlw).meta_description for urlw in clean_url_data]
The result is empty
You can find your page's meta description within the <head> section of the page's HTML markup. Most CMSs will allow you to edit this markup and change your meta description either directly within the code or via the meta description field within the page's metadata settings.
Meta descriptions (the words you write) and snippets (the words the search engines display in the SERPs) are not the same thing.
Please check BeautifulSoup as solution.
For question above, you may use the following code to extract "description" info:
import requests
from bs4 import BeautifulSoup
url = 'http://www.virginaustralia.com/au/en/bookings/flights/make-a-booking/'
response = requests.get(url)
soup = BeautifulSoup(response.text)
metas = soup.find_all('meta')
print [ meta.attrs['content'] for meta in metas if 'name' in meta.attrs and meta.attrs['name'] == 'description' ]
output:
['Search for and book Virgin Australia and partner flights to Australian and international destinations.']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With