Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What pure Python library should I use to scrape a website?

I currently have some Ruby code used to scrape some websites. I was using Ruby because at the time I was using Ruby on Rails for a site, and it just made sense.

Now I'm trying to port this over to Google App Engine, and keep getting stuck.

I've ported Python Mechanize to work with Google App Engine, but it doesn't support DOM inspection with XPATH.

I've tried the built-in ElementTree, but it choked on the first HTML blob I gave it when it ran into '&mdash'.

Do I keep trying to hack ElementTree in there, or do I try to use something else?

thanks, Mark

like image 749
MStodd Avatar asked Dec 19 '25 02:12

MStodd


1 Answers

Beautiful Soup.

like image 194
S.Lott Avatar answered Dec 20 '25 17:12

S.Lott



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!