Variables response from file_get_contents for 'https://en.wikipedia.org/wiki/Category:Upcoming_singles'

Question

file_get_contents('https://en.wikipedia.org/wiki/Category:Upcoming_singles');

returns a different response (2 products) from visiting the same address using the Chrome web browser (shows 4 products).

Upon inspection, I suspect this might be related to

Saved in parser cache key with ... timestamp ...

in the html returned. The timestamp is older when I use file_get_contents()

Any ideas on how to fetch the latest info using file_get_contents()?

Thank you!

santiagobasulto · Accepted Answer

Assuming file_get_contents is making an http request, it would be good to check the user agent specified.

I've heard of problems fetching data with some user agents. Take a look at this question.

You can specify other options (including the user agent) by using stream context:

<?php
$opts = array(
  'http'=>array(
    'method'=>"GET",
    'header'=>"Accept-language: en
" .
              "Cookie: foo=bar
"
  )
);

$context = stream_context_create($opts);

// Open the file using the HTTP headers set above
$file = file_get_contents('http://www.example.com/', false, $context);

Take a look at the file_get_contents docs.

Also, as Jack said, cURL is a better option.

EDIT:

You get me wrong. What you've to add is a different user agent. For example, using the user agent from mozilla firefox get you the 4 results:

<?php

    $opts = array(
      'http'=>array(
        'method'=>"GET",
        'header'=>"Accept-language: en
" .
                  "User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; es-AR; rv:1.9.2.23) Gecko/20110921 Ubuntu/10.10 (maverick) Firefox/3.6.23"
      )
    );

    $context = stream_context_create($opts);

    // Open the file using the HTTP headers set above
    $file = file_get_contents('http://en.wikipedia.org/wiki/Category:Upcoming_singles', false, $context);
    print $file;

But, i think it's not "legal", it's not good to cheat on that. I think there must be any other user agent that wikipedia provides to fetch its data from outside apps.

Variables response from file_get_contents for 'https://en.wikipedia.org/wiki/Category:Upcoming_singles'

Tags:

php

caching

screen-scraping

wikipedia

Won Jun Bae

1 Answers

santiagobasulto

Recent Activity

Donate For Us

Variables response from file_get_contents for 'https://en.wikipedia.org/wiki/Category:Upcoming_singles'

Tags:

php

caching

screen-scraping

wikipedia

Won Jun Bae

1 Answers

santiagobasulto

Related questions

Recent Activity

Donate For Us