Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get page content using cURL?

Tags:

php

curl

People also ask

How do I open a website with curl?

The syntax for the curl command is as follows: curl [options] [URL...] In its simplest form, when invoked without any option, curl displays the specified resource to the standard output. The command will print the source code of the example.com homepage in your terminal window.

How do you do a get with curl?

To make a GET request using Curl, run the curl command followed by the target URL. Curl automatically selects the HTTP GET request method unless you use the -X, --request, or -d command-line option.

How do you use scrape curls?

Another way to scrape data with cURL is by using the -o flag followed by a filename. This will save the output of your request into the specified file. This is helpful if you're looking to save a lot of data or if you want to view the data offline.

Can I use curl in browser?

curl can do almost every HTTP operation and transfer your favorite browser can. It can actually do a lot more than so as well, but in this chapter we will focus on the fact that you can use curl to reproduce, or script, what you would otherwise have to do manually with a browser.


this is how:

   /**
     * Get a web file (HTML, XHTML, XML, image, etc.) from a URL.  Return an
     * array containing the HTTP server response header fields and content.
     */
    function get_web_page( $url )
    {
        $user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';

        $options = array(

            CURLOPT_CUSTOMREQUEST  =>"GET",        //set request type post or get
            CURLOPT_POST           =>false,        //set to GET
            CURLOPT_USERAGENT      => $user_agent, //set user agent
            CURLOPT_COOKIEFILE     =>"cookie.txt", //set cookie file
            CURLOPT_COOKIEJAR      =>"cookie.txt", //set cookie jar
            CURLOPT_RETURNTRANSFER => true,     // return web page
            CURLOPT_HEADER         => false,    // don't return headers
            CURLOPT_FOLLOWLOCATION => true,     // follow redirects
            CURLOPT_ENCODING       => "",       // handle all encodings
            CURLOPT_AUTOREFERER    => true,     // set referer on redirect
            CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
            CURLOPT_TIMEOUT        => 120,      // timeout on response
            CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects
        );

        $ch      = curl_init( $url );
        curl_setopt_array( $ch, $options );
        $content = curl_exec( $ch );
        $err     = curl_errno( $ch );
        $errmsg  = curl_error( $ch );
        $header  = curl_getinfo( $ch );
        curl_close( $ch );

        $header['errno']   = $err;
        $header['errmsg']  = $errmsg;
        $header['content'] = $content;
        return $header;
    }

Example

//Read a web page and check for errors:

$result = get_web_page( $url );

if ( $result['errno'] != 0 )
    ... error: bad url, timeout, redirect loop ...

if ( $result['http_code'] != 200 )
    ... error: no page, no permissions, no service ...

$page = $result['content'];

For a realistic approach that emulates the most human behavior, you may want to add a referer in your curl options. You may also want to add a follow_location to your curl options. Trust me, whoever said that cURLING Google results is impossible, is a complete dolt and should throw his/her computer against the wall in hopes of never returning to the internetz again. Everything that you can do "IRL" with your own browser can all be emulated using PHP cURL or libCURL in Python. You just need to do more cURLS to get buff. Then you will see what I mean. :)

  $url = "http://www.google.com/search?q=".$strSearch."&hl=en&start=0&sa=N";
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_REFERER, 'http://www.example.com/1');
  curl_setopt($ch, CURLOPT_HEADER, 0);
  curl_setopt($ch, CURLOPT_VERBOSE, 0);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible;)");
  curl_setopt($ch, CURLOPT_URL, urlencode($url));
  $response = curl_exec($ch);
  curl_close($ch);

Try This:

$url = "http://www.google.com/search?q=".$strSearch."&hl=en&start=0&sa=N";
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_HEADER, 0);
  curl_setopt($ch, CURLOPT_VERBOSE, 0);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible;)");
  curl_setopt($ch, CURLOPT_URL, urlencode($url));
  $response = curl_exec($ch);
  curl_close($ch);

I suppose that have you noticed that your link is actually an HTTPS link.... It seems that CURL parameters do not include any kind of SSH handling... maybe this could be your problem. Why don't you try with a non-HTTPS link to see what happens (i.e Google Custom Search Engine)...?


Get content with Curl php

request server support Curl function, enable in httpd.conf in folder Apache


function UrlOpener($url)
     global $output;
     $ch = curl_init(); 
     curl_setopt($ch, CURLOPT_URL, $url); 
     curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
     $output = curl_exec($ch); 
     curl_close($ch);    
     echo $output;

If get content by google cache use Curl you can use this url: http://webcache.googleusercontent.com/search?q=cache:Put your url Sample: http://urlopener.mixaz.net/