Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Beautiful Soup Extracting HTML Meta Data

I am getting some odd behavior that I do not quite understand. I am hoping someone can explain what is going on.

Consider this metadata:

<meta property="og:title" content="This is the Tesla Semi truck">
<meta name="twitter:title" content="This is the Tesla Semi truck">

This line successfully finds ALL "og" properties and returns a list.

opengraphs = doc.html.head.findAll(property=re.compile(r'^og'))

However, this line fails to do the same thing for the twitter cards.

twitterCards = doc.html.head.findAll(name=re.compile(r'^twitter'))

Why does the first line successfully find all the "og" (opengraph cards), but fail to find the twitter cards?

like image 552
Tyler Bell Avatar asked Dec 06 '25 09:12

Tyler Bell


2 Answers

Problem is name= which has special meaning. It is used to find tag name - in your code it is meta

You have to add "meta" and use dictionary with "name"

Example with different items.

from bs4 import BeautifulSoup
import re

data='''
<meta property="og:title" content="This is the Tesla Semi truck">
<meta property="twitter:title" content="This is the Tesla Semi truck">
<meta name="twitter:title" content="This is the Tesla Semi truck">
'''

head = BeautifulSoup(data)

print(head.findAll(property=re.compile(r'^og'))) # OK
print(head.findAll(property=re.compile(r'^tw'))) # OK

print(head.findAll(name=re.compile(r'^meta'))) # OK
print(head.findAll(name=re.compile(r'^tw')))   # empty

print(head.findAll('meta', {'name': re.compile(r'^tw')})) # OK
like image 191
furas Avatar answered Dec 08 '25 21:12

furas


This is because name is the name of the tag name argument which basically means that in this case BeautifulSoup would look for elements with tag names that start with twitter.

In order to specify that you actually mean an attribute, use:

doc.html.head.find_all(attrs={'name': re.compile(r'^twitter')})

Or, via a CSS selector:

doc.html.head.select("[name^=twitter]")

where ^= means "starts with".

like image 37
alecxe Avatar answered Dec 08 '25 23:12

alecxe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!