Since strings are immutable in .NET, why are they copied for simple operations such as Substring
or Split
? For example, by keeping a char[] value
, int start
and int length
, a substring could be created to simply point to an existing string, and we could save the overhead of copying the string for many simple operations. So I wonder, why was the decision chosen to copy strings for such operations?
For example, was this done to support the current implementation of StringBuilder
? Or to avoid keeping a reference to a large char[]
when only a few characters are required? Or any other reason you can think of? Can you suggest pros and cons for such design?
As mentioned by @cletus and supported by @Jon Skeet, this is more like asking why .NET strings were built differently from Java in this aspect.
That's basically the way that Java works. There are a few benefits of the .NET way, IMO:
20+2*n
bytes. In Java you've got the size of the array (12 + 2*n
) bytes and the string itself (24 bytes: object overhead, reference, start and count; it also caches the hash if it's ever calculated it). So for an empty string, the .NET version takes about 20 bytes compared with Java's 36. Of course that's the worst case, and it'll only be that "constant difference" out - but if you use a lot of independent strings that could end up being significant. More for the garbage collector to look at, too.Of course, the benefits are in terms of requiring less space when the aliasing above doesn't occur.
In the end it will depend on your usage - the compiler and runtime can't predict which usage pattern is more likely in your exact code.
There may also be interop benefits of the current string representation, but I don't know enough about that to say for sure.
EDIT: I'm not sure why your question has received so many somewhat-hostile answers. It's certainly not a "dumb" way of representing a string, and it clearly works. Fears about data loss and complexity are pretty much just FUD in this case, I believe - the Java string implementation is simple and robust. I personally suspect that the .NET way of doing things is more efficient in most programs, and I suspect MS did research to check that, but there will certainly be situations where the "shared" model works better.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With