I am trying to use python and beautiful soup to extract the content part of the tags below:
<meta property="og:title" content="Super Fun Event 1" /> <meta property="og:url" content="http://superfunevents.com/events/super-fun-event-1/" />
I'm getting BeautifulSoup to load the page just fine and find other stuff (this also grabs the article id from the id tag hidden in the source), but I don't know the correct way to search the html and find these bits, I've tried variations of find and findAll to no avail. The code iterates over a list of urls at present...
#!/usr/bin/env python # -*- coding: utf-8 -*- #importing the libraries from urllib import urlopen from bs4 import BeautifulSoup def get_data(page_no): webpage = urlopen('http://superfunevents.com/?p=' + str(i)).read() soup = BeautifulSoup(webpage, "lxml") for tag in soup.find_all("article") : id = tag.get('id') print id # the hard part that doesn't work - I know this example is well off the mark! title = soup.find("og:title", "content") print (title.get_text()) url = soup.find("og:url", "content") print (url.get_text()) # end of problem for i in range (1,100): get_data(i)
If anyone can help me sort the bit to find the og:title and og:content that'd be fantastic!
Step-by-step Approach. Step 1: The first step will be for scraping we need to import beautifulsoup module and get the request of the website we need to import the requests module. Step 2: The second step will be to request the URL call get method.
try this :
soup = BeautifulSoup(webpage) for tag in soup.find_all("meta"): if tag.get("property", None) == "og:title": print tag.get("content", None) elif tag.get("property", None) == "og:url": print tag.get("content", None)
Provide the meta
tag name as the first argument to find()
. Then, use keyword arguments to check the specific attributes:
title = soup.find("meta", property="og:title") url = soup.find("meta", property="og:url") print(title["content"] if title else "No meta title given") print(url["content"] if url else "No meta url given")
The if
/else
checks here would be optional if you know that the title and url meta properties would always be present.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With