HTML Scraping in Php [duplicate]

Question

I've been doing some HTML scraping in PHP using regular expressions. This works, but the result is finicky and fragile. Has anyone used any packages that provide a more robust solution? A config driven solution would be ideal, but I'm not picky.

Espo · Accepted Answer

I would recomend PHP Simple HTML DOM Parser after you have scraped the HTML from the page. It supports invalid HTML, and provides a very easy way to handle HTML elements.

John Douthat · Answer

If the page you're scraping is valid X(HT)ML, then any of PHP's built-in XML parsers will do.

I haven't had much success with PHP libraries for scraping. If you're adventurous though, you can try simplehtmldom. I'd recommend Hpricot for Ruby or Beautiful Soup for Python, which are both excellent parsers for HTML.

HTML Scraping in Php [duplicate]

Tags:

html

php

screen-scraping

tsellon

2 Answers

Espo

John Douthat

Recent Activity

Donate For Us

HTML Scraping in Php [duplicate]

Tags:

html

php

screen-scraping

tsellon

2 Answers

Espo

John Douthat

Related questions

Recent Activity

Donate For Us