DO you know a library for Web page scraping for Delphi. Like Beautiful Soup or Scrapy for Python ?

Well, it's not for Delphi, but for FreePascal, since I do not have a recent Delphi version, but porting between them is supposed to be not so difficult. Anyways, my Internet Tools are probably the best Pascal web scraping library that are out there. You can, e.g. print all links on a page with: <pre class="prettyprint"><code>uses simpleinternet, xquery; var a: IXQValue; begin for a in process('http://stackoverflow.com', '//a/@href') do writeln(a.toString); end. </code></pre> They are platform independent; have full support for XPath 2, XQuery, CSS 3 selectors (those are not so well tested through, XPath is better anyways) and pattern-matching; parse xml and html; and download over http and https.

After the page is loaded with TWebBrowser component, query the TWebBrowser.Document property for the IHTMLDocument2 interface and then you can enumerate the elements. You can getElementsById, getElementsByTagName, getElementsByName, for example: <pre class="prettyprint"><code>var Elem: IHTMLElement; begin Elem := GetElementById(WebBrowser1.Document, 'myid') as IHTMLElement; end; </code></pre> or get all HTML text and use any way you want, for example: <pre class="prettyprint"><code>sourceHTML := WebBrowser.Document as IHTMLDocument2; sourceHTML.body.innerHTML; </code></pre>

Web page scraping in Delphi

2 Answers

Well, it's not for Delphi, but for FreePascal, since I do not have a recent Delphi version, but porting between them is supposed to be not so difficult.

Anyways, my Internet Tools are probably the best Pascal web scraping library that are out there.

You can, e.g. print all links on a page with:

uses simpleinternet, xquery;

var a: IXQValue;
begin
  for a in process('http://stackoverflow.com', '//a/@href') do
    writeln(a.toString);
end.

They are platform independent; have full support for XPath 2, XQuery, CSS 3 selectors (those are not so well tested through, XPath is better anyways) and pattern-matching; parse xml and html; and download over http and https.

117

answered Oct 27 '22 00:10

BeniBela

After the page is loaded with TWebBrowser component, query the TWebBrowser.Document property for the IHTMLDocument2 interface and then you can enumerate the elements.

You can getElementsById, getElementsByTagName, getElementsByName, for example:

var
  Elem: IHTMLElement;
begin
   Elem := GetElementById(WebBrowser1.Document, 'myid') as IHTMLElement;
end;

or get all HTML text and use any way you want, for example:

sourceHTML := WebBrowser.Document as IHTMLDocument2;
sourceHTML.body.innerHTML;

answered Oct 27 '22 00:10

Leonardo Gregianin

Related questions
                            
                                Why do I get an "incompatible types" error even though I've made a record definition available to all units?
                            
                                Translate Delphi style ASM to English?
                            
                                How to hunt down a unit being 'implicitly imported' in a Delphi 6 package?
                            
                                Delphi - What object (multidimensional array, etc) will work?
                            
                                Access to public methods and properties inside a Delphi BPL
                            
                                InnoSetup: Is it possible to open my custom Delphi form (from the DLL) instead of the standard setup wizard
                            
                                How to get the equivalent of a static(class) field in Delphi?
                            
                                List all web-browsers installed on a Windows machine
                            
                                Can TTimer object be a field of a Delphi class?
                            
                                Windows 8 Consumer Preview Wrong Major Version?
                            
                                Does anyone know a free compiler for Delphi? [closed]
                            
                                XE2 exe and dcu location
                            
                                how to add comment in delphi form
                            
                                Indy synchronize ServerTCPExecute
                            
                                Alphablend and TransparentBlt
                            
                                Why interrupt 0x2A doesn't work in x64?
                            
                                Writing to files on system drive C: without admin rights in Delphi
                            
                                How do I send a command to a single client instead of all of them?
                            
                                Anything else I can use to keep program responsive besides processmessage?
                            
                                How can I stop my application showing on the taskbar?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Web page scraping in Delphi

Tags:

web-scraping

delphi

philnext

People also ask

2 Answers

BeniBela

Leonardo Gregianin

Recent Activity

Donate For Us