Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I extract the target URL from a Google search result?

I am trying to extract URLs from Google search results. I use Indy IdHTTP to get HTML results from Google, and I use Achmad Z's code for getting the link hrefs from the page. How can I get the real link target for each URL instead of the one that goes through Google's redirector?


I tried that but I get an "Operand no applicable" error in this part of the code:

function ToUTF8Encode(str: string): string;
var
  b: Byte;
begin
  for b in BytesOf(UTF8Encode(str)) do
  begin
    Result := Format('%s%s%.2x', [Result, '%', b]);
  end;
end;

I use Delphi 7 with Indy 9.00.10. Maybe indy update will help ?

like image 885
Danijel Maksimovic Maxa Avatar asked Oct 11 '11 19:10

Danijel Maksimovic Maxa


2 Answers

In the previous post here I've tried to explain why you should use Google Search API, in this one I'll try to provide you an example with a hope it will work in your Delphi 7.

You need to have the SuperObject (JSON parser for Delphi), I've used this version (latest at this time). Then you need Indy; the best would be to upgrade to the latest version if possible. I've used the one shipped with Delphi 2009, but I think the TIdHTTP.Get method is so important that it must work fine also in your 9.00.10 version.

Now you need a list box and a button on your form, the following piece of code and a bit of luck (for compatibility :)

The URL request building you can see for instance in the DxGoogleSearchApi.pas mentioned before but the best is to follow the Google Web Search API reference. In DxGoogleSearchApi.pas you can take the inspiration e.g. how to fetch several pages.

So take this as an inspiration

uses
  IdHTTP, IdURI, SuperObject;

const
  GSA_Version = '1.0';
  GSA_BaseURL = 'http://ajax.googleapis.com/ajax/services/search/';

procedure TForm1.GoogleSearch(const Text: string);
var
  I: Integer;
  RequestURL: string;
  HTTPObject: TIdHTTP;
  HTTPStream: TMemoryStream;
  JSONResult: ISuperObject;
  JSONResponse: ISuperObject;
begin
  RequestURL := TIdURI.URLEncode(GSA_BaseURL + 'web?v=' + GSA_Version + '&q=' + Text);

  HTTPObject := TIdHTTP.Create(nil);
  HTTPStream := TMemoryStream.Create;

  try
    HTTPObject.Get(RequestURL, HTTPStream);
    JSONResponse := TSuperObject.ParseStream(HTTPStream, True);

    if JSONResponse.I['responseStatus'] = 200 then
    begin
      ListBox1.Items.Add('Search time: ' + JSONResponse.S['responseData.cursor.searchResultTime']);
      ListBox1.Items.Add('Fetched count: ' + IntToStr(JSONResponse['responseData.results'].AsArray.Length));
      ListBox1.Items.Add('Total count: ' + JSONResponse.S['responseData.cursor.resultCount']);
      ListBox1.Items.Add('');

      for I := 0 to JSONResponse['responseData.results'].AsArray.Length - 1 do
      begin
        JSONResult := JSONResponse['responseData.results'].AsArray[I];
        ListBox1.Items.Add(JSONResult.S['unescapedUrl']);
      end;
    end;

  finally
    HTTPObject.Free;
    HTTPStream.Free;
  end;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  GoogleSearch('Delphi');
end;
like image 170
TLama Avatar answered Nov 15 '22 09:11

TLama


Answer to my question , maybe it can help someone: Fetching web page :

memo1.Lines.Text := idhttp1.Get('http://ajax.googleapis.com/aja...tart=1&rsz=large&q=max');

extracting URL's :

function ExtractText(const Str, Delim1, Delim2: string; PosStart: integer; var PosEnd: integer): string;
var
  pos1, pos2: integer;
begin
  Result := '';
  pos1 := PosEx(Delim1, Str, PosStart);
  if pos1 > 0 then
  begin
    pos2 := PosEx(Delim2, Str, pos1 + Length(Delim1));
    if pos2 > 0 then
    begin
      PosEnd := pos2 + Length(Delim2);
      Result := Copy(Str, pos1 + Length(Delim1), pos2 - (pos1 + Length(Delim1)));
    end;
  end;
end;

And on Button1 just put :

procedure TForm1.Button1Click(Sender: TObject);
var Pos: integer;
    sText: string;
begin
  sText := ExtractText(Memo1.Lines.Text, '"url":"', '","visibleUrl"', 1, Pos);
  while sText <> '' do
  begin
    Memo2.Lines.Add(sText);
    sText := ExtractText(Memo1.Lines.Text, '"url":"', '","visibleUrl"', Pos, Pos);
  end;
end;

www.delphi.about.com has nice documentation on string manipulation , Zarko Gajic does great job on that site :) NOTE: if google changes it's source this will be useless.

like image 20
Danijel Maksimovic Maxa Avatar answered Nov 15 '22 09:11

Danijel Maksimovic Maxa