I'm parsing HTML pages with lxml. The pages have meta tags as follows: <pre class="prettyprint"><code><meta property="og:locality" content="Detroit" /> <meta property="og:country-name" content="USA" /> </code></pre> How can I use lxml to find the value of the <code>og:locality</code> meta tag on each page, efficiently? I've currently got the following, which just manually matches up meta tags by property: <pre class="prettyprint"><code>for meta in doc3.cssselect('meta'): prop = meta.get('property') if prop === 'og:locality': lat = meta.get('content') </code></pre> But it doesn't feel very efficient.

You could use this XPath selector: <code>//meta[@property='og:locality']/@content</code>

Parsing meta tags efficiently with lxml?

Tags:

python

css-selectors

web-scraping

lxml

screen-scraping

I'm parsing HTML pages with lxml. The pages have meta tags as follows:

<meta property="og:locality" content="Detroit" />
<meta property="og:country-name" content="USA" />

How can I use lxml to find the value of the og:locality meta tag on each page, efficiently?

I've currently got the following, which just manually matches up meta tags by property:

for meta in doc3.cssselect('meta'):
    prop = meta.get('property')
    if prop === 'og:locality':
        lat = meta.get('content')

But it doesn't feel very efficient.

745

asked Nov 15 '11 18:11

Richard

1 Answers

You could use this XPath selector: //meta[@property='og:locality']/@content

158

answered Oct 28 '22 05:10

Acorn

Related questions
                            
                                Memoization Handler
                            
                                Variable interpolation in Python [duplicate]
                            
                                What is the use of related fields in OpenERP?
                            
                                Loop through values or registry key.. _winreg Python
                            
                                Getting started with PySide [closed]
                            
                                Python library to do jQuery-like text extraction?
                            
                                Edit text using Python and curses Textbox widget?
                            
                                ImportError: dynamic module does not define init function, but it does
                            
                                remove special characters from string
                            
                                Mercurial CGI (hgweb.cgi) fails
                            
                                How do people usually implement jsonp in python? [closed]
                            
                                Python's NLTK vs. related Java Libraries? [closed]
                            
                                Attribute assignment to built-in object [duplicate]
                            
                                Python: Alternating functions every x minutes
                            
                                Python - finding date in a string
                            
                                Any way to zip to list of lists?
                            
                                Macports select default Python interpreter for executing scripts? [closed]
                            
                                Is it possible to set the python -O (optimize) flag within a script?
                            
                                Can a Python method check if it has been called from within itself?
                            
                                Re.sub not working for me

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With