I have an application that uses a large amount of strings. So I have some problem of memory usage. I know that one of the best solution in this case is to use a DB, but I cannot use this for the moment, so I am looking for others solutions.
In C# string are store in Utf16, that means I lost half of the memory usage compare to Utf8 (for the major part of my strings). So I decided to use byte array of utf8 string. But to my surprise this solution took twice more memory space than simple strings in my application.
So I have done some simple test, but I want to know the opinion of experts to be sure.
Test 1 : Fixed length strings allocation
var stringArray = new string[10000];
var byteArray = new byte[10000][];
var Sb = new StringBuilder();
var utf8 = Encoding.UTF8;
var stringGen = new Random(561651);
for (int i = 0; i < 10000; i++) {
for (int j = 0; j < 10000; j++) {
Sb.Append((stringGen.Next(90)+32).ToString());
}
stringArray[i] = Sb.ToString();
byteArray[i] = utf8.GetBytes(Sb.ToString());
Sb.Clear();
}
GC.Collect();
GC.WaitForFullGCComplete(5000);
Memory Usage
00007ffac200a510 1 80032 System.Byte[][]
00007ffac1fd02b8 56 152400 System.Object[]
000000bf7655fcf0 303 3933750 Free
00007ffac1fd5738 10004 224695091 System.Byte[]
00007ffac1fcfc40 10476 449178396 System.String
As we can see, bytes arrays take twice less memory space, no real surprise here.
Test 2 : Random size string allocation (with a realistic length)
var stringArray = new string[10000];
var byteArray = new byte[10000][];
var Sb = new StringBuilder();
var utf8 = Encoding.UTF8;
var lengthGen = new Random(2138784);
for (int i = 0; i < 10000; i++) {
for (int j = 0; j < lengthGen.Next(100); j++) {
Sb.Append(i.ToString());
stringArray[i] = Sb.ToString();
byteArray[i] = utf8.GetBytes(Sb.ToString());
}
Sb.Clear();
}
GC.Collect();
GC.WaitForFullGCComplete(5000);
Memory Usage
00007ffac200a510 1 80032 System.Byte[][]
000000be2aa8fd40 12 82784 Free
00007ffac1fd02b8 56 152400 System.Object[]
00007ffac1fd5738 9896 682260 System.Byte[]
00007ffac1fcfc40 10368 1155110 System.String
String takes a little less space than twice time the memory space of byte array. With shorter string I was expecting a greater overhead for strings. But it seems that the opposite is, why?
Test 3 : String model corresponding to my application
var stringArray = new string[10000];
var byteArray = new byte[10000][];
var Sb = new StringBuilder();
var utf8 = Encoding.UTF8;
var lengthGen = new Random();
for (int i=0; i < 10000; i++) {
if (i%2 == 0) {
for (int j = 0; j < lengthGen.Next(100000); j++) {
Sb.Append(i.ToString());
stringArray[i] = Sb.ToString();
byteArray[i] = utf8.GetBytes(Sb.ToString());
Sb.Clear();
}
} else {
stringArray[i] = Sb.ToString();
byteArray[i] = utf8.GetBytes(Sb.ToString());
Sb.Clear();
}
}
GC.Collect();
GC.WaitForFullGCComplete(5000);
Memory Usage
00007ffac200a510 1 80032 System.Byte[][]
00007ffac1fd02b8 56 152400 System.Object[]
00007ffac1fcfc40 5476 198364 System.String
00007ffac1fd5738 10004 270075 System.Byte[]
Here strings take much less memory space than byte. This can be surprising, but I supposed that empty string are referenced only once. Is it? But I don't know if this can explain all that huge difference. Is it any other reason? What is the best solution?
Strings are faster for searches (contains, index, compare) purpose. bytes are faster in create (replace, concat) purpose.
The string array is just an array of references - an array of size N will take approximately (N * 4 + 20) or (N * 8 + 20) bytes depending on the size of a reference in your JVM.
A byte in Go is an unsigned 8-bit integer. It has type uint8 . A byte has a limit of 0 – 255 in numerical range. It can represent an ASCII character.
Since bytes is the binary data while String is character data. It is important to know the original encoding of the text from which the byte array has created. When we use a different character encoding, we do not get the original string back.
This can be surprising, but I supposed that empty string are referenced only once.
Yes, an empty StringBuilder
returns string.Empty
as its result. The code snippet below prints True
:
var sb = new StringBuilder();
Console.WriteLine(object.ReferenceEquals(sb.ToString(), string.Empty));
But I don't know if this can explain all that huge difference.
Yes, this perfectly explains it. You are saving on 5,000 string
objects. The difference in bytes is roughly 270,000-(198,000/2), so about 170 kBytes. Dividing by 5 you get 34 bytes per object, which is roughly the size of a pointer on a 32-bit system.
What is the best solution?
Do the same thing: make yourself a private static readonly
empty array, and use it each time that you get string.Empty
from sb.ToString()
:
private static readonly EmptyBytes = new byte[0];
...
else
{
stringArray[i] = Sb.ToString();
if (stringArray[i] == string.Empty) {
byteArray[i] = EmptyBytes;
} else {
byteArray[i] = utf8.GetBytes(Sb.ToString());
}
Sb.Clear();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With