I need to go through a HTML string and replace characters with 0 (zero), except tags, spaces and line breaks. I created this code bellow, but it is too slow. Please, can someone help me to make it faster (optimize)?
procedure TForm1.btn1Click(Sender: TObject);
var
Txt: String;
Idx: Integer;
Tag: Boolean;
begin
Tag := False;
Txt := mem1.Text;
For Idx := 0 to Length(Txt) - 1 Do
Begin
If (Txt[Idx] = '<') Then
Tag := True Else
If (Txt[Idx] = '>') Then
Begin
Tag := False;
Continue;
end;
If Tag Then Continue;
If (not (Txt[Idx] in [#10, #13, #32])) Then
Txt[Idx] := '0';
end;
mem2.Text := Txt;
end;
The HTML text will never have "<" or ">" outside tags (in the middle of text), so I do not need to worry about this.
Thank you!
That looks pretty straightforward. It's hard to be sure without profiling the code against the data you're using, (which is always a good idea; if you need to optimize Delphi code, try running it through Sampling Profiler first to get an idea where you're actually spending all your time,) but if I had to make an educated guess, I'd guess that your bottleneck is in this line:
Txt[Idx] := '0';
As part of the compiler's guarantee of safe copy-on-write semantics for the string
type, every write to an individual element (character) of a string involves a hidden call to the UniqueString
routine. This makes sure that you're not changing a string that something else, somewhere else, holds a reference to.
In this particular case, that's not necessary, because you got the string fresh in the start of this routine and you know it's unique. There's a way around it, if you're careful.
CLEAR AND UNAMBIGUOUS WARNING: Do not do what I'm about to explain without making sure you have a unique string first! The easiest way to accomplish this is to call UniqueString
manually. Also, do not do anything during the loop that could assign this string to any other variable. While we're doing this, it's not being treated as a normal string. Failure to heed this warning can cause data corruption.
OK, now that that's been explained, you can use a pointer to access the characters of the string directly, and get around the compiler's safeguards, like so:
procedure TForm1.btn1Click(Sender: TObject);
var
Txt: String;
Idx: Integer;
Tag: Boolean;
current: PChar; //pointer to a character
begin
Tag := False;
Txt := mem1.Text;
UniqueString(txt); //very important
if length(txt) = 0 then
Exit; //If you don't check this, the next line will raise an AV on a blank string
current := @txt[1];
dec(current); //you need to start before element 1, but the compiler won't let you
//assign to element 0
For Idx := 0 to Length(Txt) - 1 Do
Begin
inc(current); //put this at the top of the loop, to handle Continue cases correctly
If (current^ = '<') Then
Tag := True Else
If (current^ = '>') Then
Begin
Tag := False;
Continue;
end;
If Tag Then Continue;
If (not (current^ in [#10, #13, #32])) Then
current^ := '0';
end;
mem2.Text := Txt;
end;
This changes the metaphor. Instead of indexing into the string as an array, we're treating it like a tape, with the pointer as the head, moving forward one character at a time, scanning from beginning to end, and changing the character under it when appropriate. No redundant calls to UniqueString
, and no repeatedly calculating offsets, which means this can be a lot faster.
Be very careful when using pointers like this. The compiler's safety checks are there for a good reason, and using pointers steps outside of them. But sometimes, they can really help speed things up in your code. And again, profile before trying anything like this. Make sure that you know what's slowing things down, instead of just thinking you know. If it turns out to be something else that's running slow, don't do this; find a solution to the real problem instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With