Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting more search results per page via URL

I've been writing a program which extracts data from web searches. To get more data, I'd ideally like to extract more results per query through a script (let's say 100 or so).

My question is, is there a way to modify the URL for Google, Yahoo, or Bing (preference in that order) so that I can get more than 10 results per query?

For Google, appending &num=99 used to work at one point but no longer works :( I saw a similar append of &count=50 but that didn't work on any of the search engines either.

like image 641
user1319504 Avatar asked Jul 15 '13 18:07

user1319504


2 Answers

The reason num=99 doesn't work for Google is because the num parameter's actual value isn't used, but is instead compared to a list of allowed values.

The allowed values are 10, 20, 30, 40, 50, and 100. Any other values for this field are ignored.

For Bing, the parameter is count=## where ## can be anything from 1-100.

For Yahoo, the parameter is n=## where ## can be anything from 1-100.

In most cases, the URL parameter will only work if the user hasn't specified the number of search results to show in the search engine's search settings. Otherwise, that cookie will take precedence.

like image 54
Steve Avatar answered Oct 14 '22 06:10

Steve


I don't know what programming language you're using, but the general idea is to load the google search page with the proper cookie settings (that is how they are stored at the time of this writing).

You can set and then view cookies in Google Chrome. To avoid unnecessary cookies, start by opening a new incognito window (Ctrl+Shift+N), and navigating to the search settings (https://www.google.com/preferences).

At the time of writing, you will want to check "Never show instant results", and then adjust the slider of "Results per page" to whatever value you want. After hitting "Save" at the bottom, you can now view your cookies by opening the developer console (Ctrl+Shift+J), and navigating to the resource tab.

Again, at the time of writing, Google sets two variables, NID and PREF. PREF is the one we're interested in to get the search results to change. An example of what it may look like:

ID=8155cce71859f7d0:U=fe6e69e174148b7b:FF=0:LD=en:NR=40:TM=1379366492:LM=1379366586:SG=2:S=FoybwBhek8noyp0t

(This key fetches 40 results as indicated by NR=40)

With this key (PREF) and value for it (as seen above), you can send the cookie when requesting a page via wget, curl, etc. In my most recent project related to this, I was using node with the requests library.

Here is a snippet on how you may go about fetching a Google page with 40 results (modified example from the requests documentation):

var j = request.jar();
var cookie = request.cookie('PREF=ID=8155cce71859f7d0:U=fe6e69e174148b7b:FF=0:LD=en:NR=40:TM=1379366492:LM=1379366586:SG=2:S=FoybwBhek8noyp0t');
j.add(cookie);
request({url: 'https://www.google.com/search', jar: j}, 
function(error, response, body) {
    // do something with the body (html) of the page! 
});

Or take a look at the man pages for wget / curl. I know that wget specifies a --load-cookies flag that you can use.

You can apply this to any other cookie-based website that you need content from. Yahoo! uses cookie based settings - I'm not sure what Bing uses.

like image 38
Jay Avatar answered Oct 14 '22 05:10

Jay