Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a PHP equivalent of Perl's WWW::Mechanize?

I'm looking for a library that has functionality similar to Perl's WWW::Mechanize, but for PHP. Basically, it should allow me to submit HTTP GET and POST requests with a simple syntax, and then parse the resulting page and return in a simple format all forms and their fields, along with all links on the page.

I know about CURL, but it's a little too barebones, and the syntax is pretty ugly (tons of curl_foo($curl_handle, ...) statements

Clarification:

I want something more high-level than the answers so far. For example, in Perl, you could do something like:

# navigate to the main page
$mech->get( 'http://www.somesite.com/' ); 

# follow a link that contains the text 'download this'
$mech->follow_link( text_regex => qr/download this/i );

# submit a POST form, to log into the site
$mech->submit_form(
    with_fields      => {
        username    => 'mungo',
        password    => 'lost-and-alone',
    }
);

# save the results as a file
$mech->save_content('somefile.zip');

To do the same thing using HTTP_Client or wget or CURL would be a lot of work, I'd have to manually parse the pages to find the links, find the form URL, extract all the hidden fields, and so on. The reason I'm asking for a PHP solution is that I have no experience with Perl, and I could probably build what I need with a lot of work, but it would be much quicker if I could do the above in PHP.

like image 549
davr Avatar asked Oct 13 '08 21:10

davr


4 Answers

SimpleTest's ScriptableBrowser can be used independendly from the testing framework. I've used it for numerous automation-jobs.

like image 193
troelskn Avatar answered Nov 06 '22 14:11

troelskn


I feel compelled to answer this, even though its an old post... I've been working with PHP curl a lot and it is not as good anywhere near comparable to something like WWW:Mechanize, which I am switching to (I think I am going to go with the Ruby language implementation).. Curl is outdated as it requires too much "grunt work" to automate anything, the simpletest scriptable browser looked promising to me but in testing it, it won't work on most web forms I try it on... honestly, I think PHP is lacking in this category of scraping, web automation so its best to look at a different language, just wanted to post this since I have spent countless hours on this topic and maybe it will save someone else some time in the future.

like image 3
Rick Avatar answered Nov 06 '22 14:11

Rick


It's 2016 now and there's Mink. It even supports different engines from headless pure-PHP "browser" (without JavaScript), over Selenium (which needs a browser like Firefox or Chrome) to a headless "browser.js" in NPM, which DOES support JavaScript.

like image 3
mbirth Avatar answered Nov 06 '22 14:11

mbirth


Try looking in the PEAR library. If all else fails, create an object wrapper for curl.

You can so something simple like this:

class curl {
    private $resource;

    public function __construct($url) {
        $this->resource = curl_init($url);
    }

    public function __call($function, array $params) {
        array_unshift($params, $this->resource);
        return call_user_func_array("curl_$function", $params);
    }
}
like image 1
2 revs Avatar answered Nov 06 '22 14:11

2 revs