Pros and Cons of Python Web Scraping using BeautifulSoup vs XPath [closed]

Question

I've been learning about web scraping using BeautifulSoup in Python recently, but earlier today I was advised to consider using XPath expressions instead.

How does the way XPath and BeautifulSoup both work differ from each other?

Spade · Accepted Answer

I have used both BeautifulSoup and lxml and incline towards the use of lxml based on experience. See performance comparison here. One thing to be wary of when using BeautifulSoup is the explicit election of a parser. The default parser chosen for you may incorrectly parse results without warnings that can lead to nightmares - my experience here.

Having said that, I find it often easier to write a bs4 snippet than the corresponding lxml.

SnoopyGuo · Answer

I would suggest bs4, its usage and docs were more friendly, will save your time and increase confidence which is very important when you are self learning string manipulation.

However in practice, it will require a strong CPU. I once scrape with not more than 30 connections on my 1core VPS, and CPU usage of python process keeps at 100%. It could be result of bad implementation, but later I chaned all to re.compile and performance issue was gone.

As for performance, regex > lxml >> bs4. As for get things done, no difference.

Pros and Cons of Python Web Scraping using BeautifulSoup vs XPath [closed]

Tags:

python

beautifulsoup

web-scraping

xpath

DanielSon

2 Answers

Spade

SnoopyGuo

Recent Activity

Donate For Us

Pros and Cons of Python Web Scraping using BeautifulSoup vs XPath [closed]

Tags:

python

beautifulsoup

web-scraping

xpath

DanielSon

2 Answers

Spade

SnoopyGuo

Related questions

Recent Activity

Donate For Us