Get meta tag content property with BeautifulSoup and Python

Tags:

I am trying to use python and beautiful soup to extract the content part of the tags below:

<meta property="og:title" content="Super Fun Event 1" /> <meta property="og:url" content="http://superfunevents.com/events/super-fun-event-1/" />

I'm getting BeautifulSoup to load the page just fine and find other stuff (this also grabs the article id from the id tag hidden in the source), but I don't know the correct way to search the html and find these bits, I've tried variations of find and findAll to no avail. The code iterates over a list of urls at present...

#!/usr/bin/env python # -*- coding: utf-8 -*-  #importing the libraries from urllib import urlopen from bs4 import BeautifulSoup  def get_data(page_no):     webpage = urlopen('http://superfunevents.com/?p=' + str(i)).read()     soup = BeautifulSoup(webpage, "lxml")     for tag in soup.find_all("article") :         id = tag.get('id')         print id # the hard part that doesn't work - I know this example is well off the mark!             title = soup.find("og:title", "content")     print (title.get_text())     url = soup.find("og:url", "content")     print (url.get_text()) # end of problem  for i in range (1,100):     get_data(i)

If anyone can help me sort the bit to find the og:title and og:content that'd be fantastic!

852

asked Apr 21 '16 11:04

the_t_test_1

Video Answer

2 Answers

try this :

soup = BeautifulSoup(webpage) for tag in soup.find_all("meta"):     if tag.get("property", None) == "og:title":         print tag.get("content", None)     elif tag.get("property", None) == "og:url":         print tag.get("content", None)

answered Sep 22 '22 22:09

Hackaholic

Provide the meta tag name as the first argument to find(). Then, use keyword arguments to check the specific attributes:

title = soup.find("meta", property="og:title") url = soup.find("meta", property="og:url")  print(title["content"] if title else "No meta title given") print(url["content"] if url else "No meta url given")

The if/else checks here would be optional if you know that the title and url meta properties would always be present.

114

answered Sep 26 '22 22:09

alecxe

Related questions
                            
                                Summing the contents of two collections.Counter() objects [duplicate]
                            
                                pandas : update value if condition in 3 columns are met
                            
                                python sorting dictionary by length of values
                            
                                The right way to limit maximum number of threads running at once?
                            
                                Passing csrftoken with python Requests
                            
                                Python Metaclass : Understanding the 'with_metaclass()'
                            
                                How do I compare a Unicode string that has different bytes, but the same value?
                            
                                How to move pandas data from index to column after multiple groupby
                            
                                How to read a list of parquet files from S3 as a pandas dataframe using pyarrow?
                            
                                Extracting first n columns of a numpy matrix
                            
                                Python class static methods
                            
                                How can I import urlparse in python-3? [duplicate]
                            
                                Python virtualenv questions
                            
                                Retain all entries except for one key python
                            
                                Is Django for the frontend or backend? [closed]
                            
                                Validating detailed types in python dataclasses
                            
                                Display Python datetime without time
                            
                                All possible permutations of a set of lists in Python
                            
                                Can I combine two decorators into a single one in Python?
                            
                                AttributeError: type object 'Callable' has no attribute '_abc_registry'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Get meta tag content property with BeautifulSoup and Python

Tags:

python

html

beautifulsoup

web-scraping