Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why CodeIgniter's Curl library slower than using Curl in plain PHP?

Recently I moved my scraping code with Curl to CodeIgniter. I'm using Curl CI library from http://philsturgeon.co.uk/code/codeigniter-curl. I put the scraping process in a controller and then I found the execution time of my scraping is slower than the one I built in plain PHP.

It took 12 seconds for CodeIgniter to output the result, whereas it only takes 6 seconds in plain PHP. Both are including the parsing process with the HTML DOM parser.

Here's my Curl code in CodeIgniter:

function curl($url, $postdata=false)
{
  $agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)";

  $this->curl->create($url);
  $this->curl->ssl(false);
  $options = array(
    'URL'             => $url,
    'HEADER'          => 0,
    'AUTOREFERER'     => true,
    'FOLLOWLOCATION'  => true,
    'TIMEOUT'         => 60,
    'RETURNTRANSFER'  => 1,
    'USERAGENT'       => $agent,
    'COOKIEJAR'       => dirname(__FILE__) . "/cookie.txt",
    'COOKIEFILE'      => dirname(__FILE__) . "/cookie.txt",
  );

  if($postdata)
  {
    $this->curl->post($postdata, $options);
  }
  else
  {
    $this->curl->options($options);
  }

  return $this->curl->execute();
}

non codeigniter (plain php) code :

function curl($url ,$binary=false,$post=false,$cookie =false ){

    $ch = curl_init();

    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs
    curl_setopt ($ch, CURLOPT_SSL_VERIFYHOST, 2); 

    curl_setopt ($ch, CURLOPT_URL, $url );
    curl_setopt ($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_REFERER, $url);
    curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate');
    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 60);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);



    if($cookie){


        $agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)";
        curl_setopt($ch, CURLOPT_USERAGENT, $agent);
        curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . "/cookie.txt");
        curl_setopt($ch, CURLOPT_COOKIEFILE, dirname(__FILE__) . "/cookie.txt");

    }


    if($binary)
        curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);


    if($post){


        foreach($post as $key=>$value) 
            { 
        $post_array_string1 .= $key.'='.$value.'&'; 
        }
        $post_array_string1 = rtrim($post_array_string1,'&');

        //set the url, number of POST vars, POST data

        curl_setopt($ch, CURLOPT_POST, true);
        curl_setopt($ch, CURLOPT_POSTFIELDS, $post_array_string1);
    }

        return  curl_exec ($ch);

}

Does anyone know why this CodeIgniter Curl is slower?? or maybe it's because the simple_html_dom parser??

like image 911
Maia Cube Avatar asked Oct 16 '12 03:10

Maia Cube


People also ask

Is cURL faster than File_get_contents?

file_get_contents() is slightly faster than cURL.

What is curl_ setopt in PHP?

curl_setopt — Set an option for a cURL transfer.

Is cURL a PHP library?

cURL is a PHP library and command-line tool (similar to wget) that allows you to send and receive files over HTTP and FTP. You can use proxies, pass data over SSL connections, set cookies, and even get files that are protected by a login.


1 Answers

I'm not sure I know the exact answer for this, but I have a few observations about Curl & CI as I use it extensively.

  1. Check for the state of DNS caches/queries.

I noticed a substantial speedup when code was uploaded to a hosted staging server from my dev desktop. It was traced to a DNS issue that was solved by rebooting a bastion host... You can sometimes check this by using IP addresses instead of hostnames.

  1. Phil's 'library' is really just a wrapper.

All he's really done is map CI-style functions to the PHP Curl library. There's almost nothing else going on. I spent some time poking around (I forget why) and it was really unremarkable. That said, there may well be some general CI overhead - you might see what happens in another similar framework (Fuel, Kohana, Laravel, etc).

  1. Check your reverse lookup.

Some API's do reverse DNS checks as part of their security scanning. Sometimes hostnames or other headers are badly set in buried configs and can cause real headaches.

  1. Use Chrome's Postman extension to debug REST APIs.

No comment, it's brilliant - https://github.com/a85/POSTMan-Chrome-Extension/wiki and you have fine grained control of the 'conversation'.

like image 183
ckm Avatar answered Nov 10 '22 13:11

ckm