I am trying to extract URLs from Google search results. I use Indy IdHTTP to get HTML results from Google, and I use Achmad Z's code for getting the link hrefs from the page. How can I get the real link target for each URL instead of the one that goes through Google's redirector?
I tried that but I get an "Operand no applicable" error in this part of the code:
function ToUTF8Encode(str: string): string;
var
b: Byte;
begin
for b in BytesOf(UTF8Encode(str)) do
begin
Result := Format('%s%s%.2x', [Result, '%', b]);
end;
end;
I use Delphi 7 with Indy 9.00.10. Maybe indy update will help ?
In the previous post here I've tried to explain why you should use Google Search API, in this one I'll try to provide you an example with a hope it will work in your Delphi 7.
You need to have the SuperObject
(JSON parser for Delphi), I've used this version
(latest at this time). Then you need Indy; the best would be to upgrade to the latest version if possible. I've used the one shipped with Delphi 2009, but I think the TIdHTTP.Get
method is so important that it must work fine also in your 9.00.10 version.
Now you need a list box and a button on your form, the following piece of code and a bit of luck (for compatibility :)
The URL request building you can see for instance in the DxGoogleSearchApi.pas mentioned before but the best is to follow the Google Web Search API reference
. In DxGoogleSearchApi.pas you can take the inspiration e.g. how to fetch several pages.
So take this as an inspiration
uses
IdHTTP, IdURI, SuperObject;
const
GSA_Version = '1.0';
GSA_BaseURL = 'http://ajax.googleapis.com/ajax/services/search/';
procedure TForm1.GoogleSearch(const Text: string);
var
I: Integer;
RequestURL: string;
HTTPObject: TIdHTTP;
HTTPStream: TMemoryStream;
JSONResult: ISuperObject;
JSONResponse: ISuperObject;
begin
RequestURL := TIdURI.URLEncode(GSA_BaseURL + 'web?v=' + GSA_Version + '&q=' + Text);
HTTPObject := TIdHTTP.Create(nil);
HTTPStream := TMemoryStream.Create;
try
HTTPObject.Get(RequestURL, HTTPStream);
JSONResponse := TSuperObject.ParseStream(HTTPStream, True);
if JSONResponse.I['responseStatus'] = 200 then
begin
ListBox1.Items.Add('Search time: ' + JSONResponse.S['responseData.cursor.searchResultTime']);
ListBox1.Items.Add('Fetched count: ' + IntToStr(JSONResponse['responseData.results'].AsArray.Length));
ListBox1.Items.Add('Total count: ' + JSONResponse.S['responseData.cursor.resultCount']);
ListBox1.Items.Add('');
for I := 0 to JSONResponse['responseData.results'].AsArray.Length - 1 do
begin
JSONResult := JSONResponse['responseData.results'].AsArray[I];
ListBox1.Items.Add(JSONResult.S['unescapedUrl']);
end;
end;
finally
HTTPObject.Free;
HTTPStream.Free;
end;
end;
procedure TForm1.Button1Click(Sender: TObject);
begin
GoogleSearch('Delphi');
end;
Answer to my question , maybe it can help someone: Fetching web page :
memo1.Lines.Text := idhttp1.Get('http://ajax.googleapis.com/aja...tart=1&rsz=large&q=max');
extracting URL's :
function ExtractText(const Str, Delim1, Delim2: string; PosStart: integer; var PosEnd: integer): string;
var
pos1, pos2: integer;
begin
Result := '';
pos1 := PosEx(Delim1, Str, PosStart);
if pos1 > 0 then
begin
pos2 := PosEx(Delim2, Str, pos1 + Length(Delim1));
if pos2 > 0 then
begin
PosEnd := pos2 + Length(Delim2);
Result := Copy(Str, pos1 + Length(Delim1), pos2 - (pos1 + Length(Delim1)));
end;
end;
end;
And on Button1 just put :
procedure TForm1.Button1Click(Sender: TObject);
var Pos: integer;
sText: string;
begin
sText := ExtractText(Memo1.Lines.Text, '"url":"', '","visibleUrl"', 1, Pos);
while sText <> '' do
begin
Memo2.Lines.Add(sText);
sText := ExtractText(Memo1.Lines.Text, '"url":"', '","visibleUrl"', Pos, Pos);
end;
end;
www.delphi.about.com has nice documentation on string manipulation , Zarko Gajic does great job on that site :) NOTE: if google changes it's source this will be useless.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With