Extracting contents from specific meta tags that are not closed using BeautifulSoup

Tags:

I'm trying to parse out content from specific meta tags. Here's the structure of the meta tags. The first two are closed with a backslash, but the rest don't have any closing tags. As soon as I get the 3rd meta tag, the entire contents between the <head> tags are returned. I've also tried soup.findAll(text=re.compile('keyword')) but that does not return anything since keyword is an attribute of the meta tag.

<meta name="csrf-param" content="authenticity_token"/>
<meta name="csrf-token" content="OrpXIt/y9zdAFHWzJXY2EccDi1zNSucxcCOu8+6Mc9c="/>
<meta content='text/html; charset=UTF-8' http-equiv='Content-Type'>
<meta content='en_US' http-equiv='Content-Language'>
<meta content='c2y_K2CiLmGeet7GUQc9e3RVGp_gCOxUC4IdJg_RBVo' name='google-site-    verification'>
<meta content='initial-scale=1.0,maximum-scale=1.0,width=device-width' name='viewport'>
<meta content='notranslate' name='google'>
<meta content="Learn about Uber's product, founders, investors and team. Everyone's Private Driver - Request a car from any mobile phone—text message, iPhone and Android apps. Within minutes, a professional driver in a sleek black car will arrive curbside. Automatically charged to your credit card on file, tip included." name='description'>

Here's the code:

import csv
import re
import sys
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

req3 = Request("https://angel.co/uber", headers={'User-Agent': 'Mozilla/5.0')
page3 = urlopen(req3).read()
soup3 = BeautifulSoup(page3)

## This returns the entire web page since the META tags are not closed
desc = soup3.findAll(attrs={"name":"description"})

275

asked Aug 08 '13 19:08

tcash21

1 Answers

Edited: Added regex for case sensitivity as suggested by @Albert Chen.

Python 3 Edit:

from bs4 import BeautifulSoup
import re
import urllib.request

page3 = urllib.request.urlopen("https://angel.co/uber").read()
soup3 = BeautifulSoup(page3)

desc = soup3.findAll(attrs={"name": re.compile(r"description", re.I)}) 
print(desc[0]['content'])

Although I'm not sure it will work for every page:

from bs4 import BeautifulSoup
import re
import urllib

page3 = urllib.urlopen("https://angel.co/uber").read()
soup3 = BeautifulSoup(page3)

desc = soup3.findAll(attrs={"name": re.compile(r"description", re.I)}) 
print(desc[0]['content'].encode('utf-8'))

Yields:

Learn about Uber's product, founders, investors and team. Everyone's Private Dri
ver - Request a car from any mobile phoneΓÇötext message, iPhone and Android app
s. Within minutes, a professional driver in a sleek black car will arrive curbsi
de. Automatically charged to your credit card on file, tip included.

132

answered Nov 09 '22 07:11

sihrc

Related questions
                            
                                Django Rest Framework Database Error Exception Handling
                            
                                How to remove tags that have no content
                            
                                hide chromeDriver console in python
                            
                                Pandas Lambda Function with Nan Support
                            
                                Python: Image resizing: keep proportion - add white background
                            
                                TypeError: '<' not supported between instances of 'NoneType' and 'float'
                            
                                List containing only every second second pair of elements
                            
                                How can I read Perl data structures from Python?
                            
                                Is there a more succinct / pythonic way to do this? (counting longest seq of heads, tails in coin flips)
                            
                                Writing code translator from Python to C? [closed]
                            
                                django-admin.py startproject mysite not working well on windows 7
                            
                                why i can't reverse a list of list in python
                            
                                List Manipulation in Python with pop()
                            
                                python: creating list from string [duplicate]
                            
                                Repetitive Try and Except Clauses
                            
                                Why does '.sort()' cause the list to be 'None' in Python? [duplicate]
                            
                                better way than using if-else statement in python [duplicate]
                            
                                How do I read text files within a zip file?
                            
                                How to create a multi-dimensional list
                            
                                Why am I getting 'module' object is not callable in python 3?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extracting contents from specific meta tags that are not closed using BeautifulSoup

Tags:

python

beautifulsoup

tcash21

People also ask

1 Answers

sihrc

Recent Activity

Donate For Us