Good day! I'm using Delphi XE and Indy TIdHTTP. Using Get method I get remote directory listing and I need to parse it = get list of files with their sizes and timestamps and distinguish files and subdirectories. Please, is there a good routine to do that? Thank you in advance! Vojtech
Here is the sample:
<head>
<title>127.0.0.1 - /</title>
</head>
<body>
<H1>127.0.0.1 - /</H1><hr>
<pre>
Mittwoch, 30. März 2011 12:01 <dir> <A HREF="/SubDir/">SubDir</A><br />
Mittwoch, 9. Februar 2005 17:14 113 <A HREF="/file.txt">file.txt</A><br />
</pre>
<hr>
</body>
Given the code sample, I guess the fastest way to parse it would be like this:
<pre>...</pre>
block containing all the listing lines. Should be easy.<pre>
and </pre>
into a TStringList
. Each line is a file or folder, and the format is very simple.This should give you a good start and idea using DOM:
uses
MSHTML,
ActiveX,
ComObj;
procedure DocumentFromString(Document: IHTMLDocument2; const S: WideString);
var
v: OleVariant;
begin
v := VarArrayCreate([0, 0], varVariant);
v[0] := S;
Document.Write(PSafeArray(TVarData(v).VArray));
Document.Close;
end;
function StripMultipleChar(const S: string; const C: Char): string;
begin
Result := S;
while Pos(C + C, Result) <> 0 do
Result := StringReplace(Result, C + C, C, [rfReplaceAll]);
end;
procedure TForm1.Button1Click(Sender: TObject);
var
Document: IHTMLDocument2;
Elements: IHTMLElementCollection;
Element: IHTMLElement;
I: Integer;
Line: string;
begin
Document := CreateComObject(CLASS_HTMLDocument) as IHTMLDocument2;
DocumentFromString(Document, '<head>...'); // your HTML here
Elements := Document.all.tags('A') as IHTMLElementCollection;
for I := 0 to Elements.length - 1 do
begin
Element := Elements.item(I, '') as IHTMLElement;
Memo1.Lines.Add('A HREF=' + Element.getAttribute('HREF', 2));
Memo1.Lines.Add('A innerText=' + Element.innerText);
// Text is returned immediately before the element
Line := (Element as IHTMLElement2).getAdjacentText('beforeBegin');
// Line => "Mittwoch, 30. März 2011 12:01 <dir>" OR:
// Line => "Mittwoch, 9. Februar 2005 17:14 113"...
// I don't know what is the actual delimiter:
// It could be [space] or [tab] so we need to normalize the Line
// If it's tabs then it's easier because the timestamps also contains spaces
Line := Trim(Line);
Line := StripMultipleChar(Line, #32); // strip multiple Spaces sequences
Line := StripMultipleChar(Line, #9); // strip multiple Tabs sequences
// TODO: ParseLine (from right to left)
Memo1.Lines.Add(Line);
Memo1.Lines.Add('-------------');
end;
end;
Output:
A HREF=/SubDir/
A innerText=SubDir
Mittwoch, 30. März 2011 12:01 <dir>
-------------
A HREF=/file.txt
A innerText=file.txt
Mittwoch, 9. Februar 2005 17:14 113
-------------
EDIT:
I have changed StripMultipleChar
implementation to be more simplified. yet I belive the former version was more optimized to speed. considering the fact that the Lines are very short in length, there will be no much differences in performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With