Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get meta tag content property with BeautifulSoup and Python

I am trying to use python and beautiful soup to extract the content part of the tags below:

<meta property="og:title" content="Super Fun Event 1" /> <meta property="og:url" content="http://superfunevents.com/events/super-fun-event-1/" /> 

I'm getting BeautifulSoup to load the page just fine and find other stuff (this also grabs the article id from the id tag hidden in the source), but I don't know the correct way to search the html and find these bits, I've tried variations of find and findAll to no avail. The code iterates over a list of urls at present...

#!/usr/bin/env python # -*- coding: utf-8 -*-  #importing the libraries from urllib import urlopen from bs4 import BeautifulSoup  def get_data(page_no):     webpage = urlopen('http://superfunevents.com/?p=' + str(i)).read()     soup = BeautifulSoup(webpage, "lxml")     for tag in soup.find_all("article") :         id = tag.get('id')         print id # the hard part that doesn't work - I know this example is well off the mark!             title = soup.find("og:title", "content")     print (title.get_text())     url = soup.find("og:url", "content")     print (url.get_text()) # end of problem  for i in range (1,100):     get_data(i) 

If anyone can help me sort the bit to find the og:title and og:content that'd be fantastic!

like image 852
the_t_test_1 Avatar asked Apr 21 '16 11:04

the_t_test_1


People also ask

How do you scrape a tag with BeautifulSoup?

Step-by-step Approach. Step 1: The first step will be for scraping we need to import beautifulsoup module and get the request of the website we need to import the requests module. Step 2: The second step will be to request the URL call get method.


Video Answer


2 Answers

try this :

soup = BeautifulSoup(webpage) for tag in soup.find_all("meta"):     if tag.get("property", None) == "og:title":         print tag.get("content", None)     elif tag.get("property", None) == "og:url":         print tag.get("content", None) 
like image 41
Hackaholic Avatar answered Sep 22 '22 22:09

Hackaholic


Provide the meta tag name as the first argument to find(). Then, use keyword arguments to check the specific attributes:

title = soup.find("meta", property="og:title") url = soup.find("meta", property="og:url")  print(title["content"] if title else "No meta title given") print(url["content"] if url else "No meta url given") 

The if/else checks here would be optional if you know that the title and url meta properties would always be present.

like image 114
alecxe Avatar answered Sep 26 '22 22:09

alecxe