Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What PHP web crawler libraries are available?

I'm looking for some robust, well documented PHP web crawler scripts. Perhaps a PHP port of the Java project - http://wiki.apache.org/nutch/NutchTutorial

I'm looking for both free and non free versions.

like image 985
Jason Avatar asked Jan 30 '11 10:01

Jason


People also ask

What is web crawler in php?

A Web Crawler is a program that crawls through the sites in the Web and find URL's. Normally Search Engines uses a crawler to find URL's on the Web. Google uses a crawler written in Python. There are some other search engines that uses different types of crawlers. For Web crawling we have to perform following steps-

Can PHP be used for web Scraping?

Web scraping lets you collect data from web pages across the internet. It's also called web crawling or web data extraction. PHP is a widely used back-end scripting language for creating dynamic websites and web applications. And you can implement a web scraper using plain PHP code.

How do I create a Web crawler for my website?

Here are the basic steps to build a crawler: Step 1: Add one or several URLs to be visited. Step 2: Pop a link from the URLs to be visited and add it to the Visited URLs thread. Step 3: Fetch the page's content and scrape the data you're interested in with the ScrapingBot API.


2 Answers

Just give Snoopy a try.

Excerpt: "Snoopy is a PHP class that simulates a web browser. It automates the task of retrieving web page content and posting forms, for example."

like image 120
Mimikry Avatar answered Sep 21 '22 17:09

Mimikry


https://github.com/fabpot/Goutte is also a good library compatible with psr-0 standard.

like image 43
Ajay Patel Avatar answered Sep 17 '22 17:09

Ajay Patel