Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are carriage return line feed (CR LF) not properly handled by TPerlRegEx when specified as Replacement

Tags:

delphi

I try to replace spaces with a new line using the TPerlRegEx class.

with RegExp do
begin
  Subject:=Memo1.Lines.Text;
  RegEx:=' ';
  Replacement:='\r\n';
  ReplaceAll;
  Memo1.Lines.Text:=Subject;
end;

The problem is that it treats the \r\n replacement as literal text.

like image 492
Joacim Andersson Avatar asked Jan 06 '13 15:01

Joacim Andersson


2 Answers

Use #13#10

program Project29;

{$APPTYPE CONSOLE}

uses
  SysUtils, PerlRegEx;

var RegEx: TPerlRegEx;

function CStyleEscapes(const InputText:string):string;
var i,j: Integer;

begin
  SetLength(Result, Length(InputText));
  i := 1; // input cursor
  j := 1; // output cursor
  while i <= Length(InputText) do
    if InputText[i] = '\' then
      if i = Length(InputText) then
        begin
          // Eroneous quotation...
          Result[j] := '\';
          Inc(i);
          Inc(j);
        end
      else
        begin
          case InputText[i+1] of
            'r', 'R': Result[j] := #13;
            'n', 'N': Result[j] := #10;
            't', 'T': Result[j] := #9;
            '\':
              begin
                Result[j] := '\';
                Inc(j);
                Result[j] := '\';
              end;
            else
              begin
                Result[j] := '\';
                Inc(j);
                Result[j] := InputText[i+1];
              end;
          end;
          Inc(i,2);
          Inc(j);
        end
    else
      begin
        Result[j] := InputText[i];
        Inc(i);
        Inc(j);
      end;
  SetLength(Result, j-1);
end;

begin
  RegEx := TPerlRegEx.Create;
  try

    RegEx.RegEx := ' ';
    RegEx.Replacement := CStyleEscapes('\t\t\t');;
    RegEx.Subject := 'FirstLine SecondLine';
    RegEx.ReplaceAll;
    WriteLn(RegEx.Subject);

    ReadLn;

  finally RegEx.Free;
  end;
end.
like image 176
Cosmin Prund Avatar answered Nov 09 '22 05:11

Cosmin Prund


I really wanted to know why it doesn't do the matching as expected.

Processing of \ escape sequences in the Replacement text is performed in TPerlRegEx.ComputeReplacement. If you take a look at the code you will find that there are no sequences that yield the carriage return and line feed characters. In fact ComputeReplacement is all about back references.

The processing of the matching phase of the regex is performed by the PCRE code. However, the replacement phase is pure Pascal code. And it's easy enough to inspect the code to see what it does. And it doesn't do what you think and expect it to do.

The conclusion is that you cannot specify the characters you want using escape sequences. I think you will need to devise your own rules for escaping non-printable characters and apply those rules in an OnReplace event handler.

like image 24
David Heffernan Avatar answered Nov 09 '22 05:11

David Heffernan