Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract meta description from urls using python?

I want to extract the title and description from the following website:

view-source:http://www.virginaustralia.com/au/en/bookings/flights/make-a-booking/

with the following snippet of source code:

<title>Book a Virgin Australia Flight | Virgin Australia
</title>
    <meta name="keywords" content="" />
        <meta name="description" content="Search for and book Virgin Australia and partner flights to Australian and international destinations." />

I want the title and meta content.

I used goose but it does not do a good job extracting. Here is my code:

website_title = [g.extract(url).title for url in clean_url_data]

and

website_meta_description=[g.extract(urlw).meta_description for urlw in clean_url_data] 

The result is empty

like image 467
Technologic27 Avatar asked Jun 24 '16 09:06

Technologic27


People also ask

How do I find meta description?

You can find your page's meta description within the <head> section of the page's HTML markup. Most CMSs will allow you to edit this markup and change your meta description either directly within the code or via the meta description field within the page's metadata settings.

Is meta description the same as snippet?

Meta descriptions (the words you write) and snippets (the words the search engines display in the SERPs) are not the same thing.


1 Answers

Please check BeautifulSoup as solution.

For question above, you may use the following code to extract "description" info:

import requests
from bs4 import BeautifulSoup

url = 'http://www.virginaustralia.com/au/en/bookings/flights/make-a-booking/'
response = requests.get(url)
soup = BeautifulSoup(response.text)

metas = soup.find_all('meta')

print [ meta.attrs['content'] for meta in metas if 'name' in meta.attrs and meta.attrs['name'] == 'description' ]

output:

['Search for and book Virgin Australia and partner flights to Australian and international destinations.']
like image 177
linpingta Avatar answered Oct 17 '22 05:10

linpingta