can we use XPath with BeautifulSoup?

People also ask

Can I use XPath with BeautifulSoup?

Nope, BeautifulSoup, by itself, does not support XPath expressions.

Can we use XPath in Python?

XPath Standard Functions. XPath includes over 200 built-in functions. There are functions for string values, numeric values, booleans, date and time comparison, node manipulation, sequence manipulation, and much more. These expressions can be used in JavaScript, Java, XML Schema, Python, and lots of other languages.

Can BeautifulSoup parse HTML?

The HTML content of the webpages can be parsed and scraped with Beautiful Soup.

Nope, BeautifulSoup, by itself, does not support XPath expressions.

An alternative library, lxml, does support XPath 1.0. It has a BeautifulSoup compatible mode where it'll try and parse broken HTML the way Soup does. However, the default lxml HTML parser does just as good a job of parsing broken HTML, and I believe is faster.

Once you've parsed your document into an lxml tree, you can use the .xpath() method to search for elements.

try:
    # Python 2
    from urllib2 import urlopen
except ImportError:
    from urllib.request import urlopen
from lxml import etree

url =  "http://www.example.com/servlet/av/ResultTemplate=AVResult.html"
response = urlopen(url)
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
tree.xpath(xpathselector)

There is also a dedicated lxml.html() module with additional functionality.

Note that in the above example I passed the response object directly to lxml, as having the parser read directly from the stream is more efficient than reading the response into a large string first. To do the same with the requests library, you want to set stream=True and pass in the response.raw object after enabling transparent transport decompression:

import lxml.html
import requests

url =  "http://www.example.com/servlet/av/ResultTemplate=AVResult.html"
response = requests.get(url, stream=True)
response.raw.decode_content = True
tree = lxml.html.parse(response.raw)

Of possible interest to you is the CSS Selector support; the CSSSelector class translates CSS statements into XPath expressions, making your search for td.empformbody that much easier:

from lxml.cssselect import CSSSelector

td_empformbody = CSSSelector('td.empformbody')
for elem in td_empformbody(tree):
    # Do something with these table cells.

Coming full circle: BeautifulSoup itself does have very complete CSS selector support:

for cell in soup.select('table#foobar td.empformbody'):
    # Do something with these table cells.

I can confirm that there is no XPath support within Beautiful Soup.

As others have said, BeautifulSoup doesn't have xpath support. There are probably a number of ways to get something from an xpath, including using Selenium. However, here's a solution that works in either Python 2 or 3:

from lxml import html
import requests

page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.content)
#This will create a list of buyers:
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[@class="item-price"]/text()')

print('Buyers: ', buyers)
print('Prices: ', prices)

I used this as a reference.

BeautifulSoup has a function named findNext from current element directed childern,so:

father.findNext('div',{'class':'class_value'}).findNext('div',{'id':'id_value'}).findAll('a')

Above code can imitate the following xpath:

div[class=class_value]/div[id=id_value]

Related questions
                            
                                Python/postgres/psycopg2: getting ID of row just inserted
                            
                                Doing something before program exit
                            
                                How to plot a histogram using Matplotlib in Python with a list of data?
                            
                                How to convert a dictionary to query string in Python?
                            
                                Inserting a Python datetime.datetime object into MySQL
                            
                                Selecting pandas column by location
                            
                                Replace console output in Python
                            
                                List of tuples to dictionary [duplicate]
                            
                                How to add conda environment to jupyter lab
                            
                                How to generate a random number with a specific amount of digits?
                            
                                Hello World in Python [duplicate]
                            
                                Python: Importing urllib.quote
                            
                                jsonify a SQLAlchemy result set in Flask [duplicate]
                            
                                URL query parameters to dict python
                            
                                How to check BLAS/LAPACK linkage in NumPy and SciPy?
                            
                                Python time measure function
                            
                                cartesian product in pandas
                            
                                How to create full compressed tar file using Python?
                            
                                How to get everything after last slash in a URL?
                            
                                Set order of columns in pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

can we use XPath with BeautifulSoup?

Tags:

python

beautifulsoup

urllib

web-scraping

xpath

People also ask

Recent Activity

Donate For Us