Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

System.Uri.ToString behaviour change after VS2012 install

Tags:

c#

.net

vb.net

After installing VS2012 Premium on a dev machine a unit test failed, so the developer fixed the issue. When the changes were pushed to TeamCity the unit test failed. The project has not changed other than the solution file being upgraded to be compatible with VS2012. It still targets .net framework 4.0

I've isolated the problem to an issue with unicode characters being escaped when calling Uri.ToString. The following code replicates the behavior.

Imports NUnit.Framework

<TestFixture()>
Public Class UriTest

   <Test()>
    Public Sub UriToStringUrlDecodes()
       Dim uri = New Uri("http://www.example.org/test?helloworld=foo%B6bar")

       Assert.AreEqual("http://www.example.org/test?helloworld=foo¶bar", uri.ToString())
    End Sub

End Class

Running this in VS2010 on a machine that does not have VS2012 installed succeeds, running this in VS2010 on a machine with VS2012 installed fails. Both using the latest version of NCrunch and NUnit from NuGet.

Machine without VS2012 Install

Machine with VS2012 Install

The messages from the failed assert are

  Expected string length 46 but was 48. Strings differ at index 42.
  Expected: "http://www.example.org/test?helloworld=foo¶bar"
  But was:  "http://www.example.org/test?helloworld=foo%B6bar"
  -----------------------------------------------------^

The documentation on MSDN for both .NET 4 and .NET 4.5 shows that ToString should not encode this character, meaning that the old behavior should be the correct one.

A String instance that contains the unescaped canonical representation of the Uri instance. All characters are unescaped except #, ?, and %.

After installing VS2012, that unicode character is being escaped.

The file version of System.dll on the machine with VS2012 is 4.0.30319.17929

The file version of System.dll on the build server is 4.0.30319.236

Ignoring the merits of why we are using uri.ToString(), what we are testing and any potential work around. Can anyone explain why this behavior seems to have changed, or is this a bug?

Edit, here is the C# version

using System;
using NUnit.Framework;

namespace SystemUriCSharp 
{
    [TestFixture]
    public class UriTest
    {

        [Test]
        public void UriToStringDoesNotEscapeUnicodeCharacters()
        {
            var uri = new Uri(@"http://www.example.org/test?helloworld=foo%B6bar");

            Assert.AreEqual(@"http://www.example.org/test?helloworld=foo¶bar", uri.ToString());
        }

    }
}

A bit of further investigation, if I target .NET 4.0 or .NET 4.5 the tests fail, if I switch it to .NET 3.5 then it succeeds.

like image 709
Chris Diver Avatar asked Aug 17 '12 10:08

Chris Diver


2 Answers

There are some changes introduced in .NET Framework 4.5, which is installed along with VS2012, and which is also (to the best of my knowledge) a so called "in place upgrade". This means that it actually upgrades .NET Framework 4.

Furthermore, there are breaking changes documented in System.Uri. One of them says Unicode normalization form C (NFC) will no longer be performed on non-host portions of URIs. I am not sure whether this is applicable to your case, but it could serve as a good starting point in your investigation of the error.

like image 94
Fredrik Mörk Avatar answered Oct 06 '22 10:10

Fredrik Mörk


The change is related to problems with earlier .NET versions, which have now changed to become more compliant to the standards. %B6 is UTF-16, but according to the standards UTF-8 should be used in the Uri, meaning that it should be %C2%B6. So as %B6 is not UTF-8 it is now correctly ignored and not decoded.

More details from the connect report quoted in verbatim below.

.NET 4.5 has enhanced and more compatible application of RFC 3987 which supports IRI parsing rules for URI's. IRIs are International Resource Identifiers. This allows for non-ASCII characters to be in a URI/IRI string to be parsed.

Prior to .NET 4.5, we had some inconsistent handling of IRIs. We had an app.config entry with a default of false that you could turn on:

which did some IRI handling/parsing. However, it had some problems. In particular it allowed for incorrect percent encoding handling. Percent-encoded items in a URI/IRI string are supposed to be percent-encoded UTF-8 octets according to RFC 3987. They are not interpreted as percent-encoded UTF-16. So, handling “%B6” is incorrect according to UTF-8 and no decoding will occur. The correct UTF-8 encoding for ¶ is actually “%C2%B6”.

If your string was this instead:

        string strUri = @"http://www.example.com/test?helloworld=foo%C2%B6bar";

Then it will get normalized in the ToString() method and the percent-encoding decoded and removed.

Can you provide more information about your application needs and the use of ToString() method? Usually, we recommend the AbsoluteUri property of the Uri object for most normalization needs.

If this issue is blocking your application development and business needs then please let us know via the "netfx45compat at Microsoft dot com" email address.

Thx,

Networking Team

like image 26
Chris Diver Avatar answered Oct 06 '22 11:10

Chris Diver