Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Internal representation of strings in C#

Tags:

string

c#

char

I just want to be sure:

string x = "";   
char Char = x[0];  // throws exception: "Index was outside the bounds of the array"

This means that the string is really treated as an array of chars, right? (At least internally.)

like image 965
Petr Avatar asked Sep 24 '10 14:09

Petr


2 Answers

The C# language spec makes no guarantee about the internal representation of a string. However, it implements the index operator to provide a char for each character in the string.

Edit: To clarify since a few people have commented, yes, the internal representation of System.String in the CLR is an array. However, the language specification doesn't say anything about internal representation, so this could (but is unlikely to) change. It says that a string has to work as a sequence of chars. The only bit about this in the language spec is under section 1.3:

Character and string processing in C# uses Unicode encoding. The char type represents a UTF-16 code unit, and the string type represents a sequence of UTF-16 code units.

Additionally, MSDN states:

A string is a sequential collection of Unicode characters that is used to represent text. A String object is a sequential collection of System.Char objects that represent a string. The value of the String object is the content of the sequential collection, and that value is immutable (that is, it is read-only).

So in this case, we're now talking about the CLR and not the language. System.String -- However, even there they don't guarantee an array, only a sequential collection.

A string implemented with a linked list and an indexer that moved n spaces forward in the list would be sufficient to satisfy the language reqiurements. IList<char> would also satisfy the requirements, and IList doesn't have to be array-backed.

like image 92
David Pfeffer Avatar answered Sep 28 '22 04:09

David Pfeffer


Per @JaredPar elsewhere on this site:

The underyling string you create will also need a contiguous block of memory because it is represented as an array of chars (arrays require contiguous memory) .

I am sure you should not rely on this as it's not part of the interface, but implementation is an array if this statement is correct. That makes sense to me given what we know about char-strings and Microsoft's need to support efficient interop between managed and native languages.

MSDN says only this, which does not guarantee that the storage is an array.

A string is a sequential collection of Unicode characters that is used to represent text. A String object is a sequential collection of System.Char objects that represent a string. The value of the String object is the content of the sequential collection, and that value is immutable (that is, it is read-only).

like image 37
Steve Townsend Avatar answered Sep 28 '22 04:09

Steve Townsend