Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expressions - C# behaves differently than Perl / Python

Tags:

c#

regex

Under Python:

ttsiod@elrond:~$ python
>>> import re
>>> a='This is a test'
>>> re.sub(r'(.*)', 'George', a)
'George'

Under Perl:

ttsiod@elrond:~$ perl
$a="This is a test";
$a=~s/(.*)/George/;
print $a;
(Ctrl-D)

George

Under C#:

using System;
using System.Collections.Generic;
using System.Text;
using System.Threading;
using System.Text.RegularExpressions;

namespace IsThisACsharpBug
{
  class Program
  {
    static void Main(string[] args)
    {
        var matchPattern = "(.*)";
        var replacePattern = "George";
        var newValue = Regex.Replace("This is nice", matchPattern, replacePattern);
        Console.WriteLine(newValue);
    }
  }
}

Unfortunately, C# prints:

$ csc regexp.cs
Microsoft (R) Visual C# 2008 Compiler version 3.5.30729.5420
for Microsoft (R) .NET Framework version 3.5
Copyright (C) Microsoft Corporation. All rights reserved.

$ ./regexp.exe 
GeorgeGeorge

Is this a bug in the regular expression library of C# ? Why does it print "George" two times, when Perl and Python just print it once?

like image 796
ttsiodras Avatar asked Aug 31 '11 09:08

ttsiodras


1 Answers

In your example the difference seems to be in the semantics of the 'replace' function rather than in the regular expression processing itself.

.net is doing a "global" replace, i.e. it is replacing all matches rather than just the first match.

Global Replace in Perl

(notice the small 'g' at the end of the =~s line)

$a="This is a test";
$a=~s/(.*)/George/g;
print $a;

which produces

GeorgeGeorge

Single Replace in .NET

var re = new Regex("(.*)");
var replacePattern = "George";
var newValue = re.Replace("This is nice", replacePattern, 1) ;
Console.WriteLine(newValue);

which produces

George

since it stops after the first replacement.

like image 160
Grynn Avatar answered Oct 05 '22 23:10

Grynn