I am currently working on a very large legacy application which handles a large amount of string data gathered from various sources (IE, names, identifiers, common codes relating to the business etc). This data alone can take up to 200 meg of ram in the application process.
A colleague of mine mentioned one possible strategy to reduce the memory footprint (as a lot of the individual strings are duplicate across the data sets), would be to "cache" the recurring strings in a dictionary and re-use them when required. So for example…
public class StringCacher()
{
public readonly Dictionary<string, string> _stringCache;
public StringCacher()
{
_stringCache = new Dictionary<string, string>();
}
public string AddOrReuse(string stringToCache)
{
if (_stringCache.ContainsKey(stringToCache)
_stringCache[stringToCache] = stringToCache;
return _stringCache[stringToCache];
}
}
Then to use this caching...
public IEnumerable<string> IncomingData()
{
var stringCache = new StringCacher();
var dataList = new List<string>();
// Add the data, a fair amount of the strings will be the same.
dataList.Add(stringCache.AddOrReuse("AAAA"));
dataList.Add(stringCache.AddOrReuse("BBBB"));
dataList.Add(stringCache.AddOrReuse("AAAA"));
dataList.Add(stringCache.AddOrReuse("CCCC"));
dataList.Add(stringCache.AddOrReuse("AAAA"));
return dataList;
}
As strings are immutable and a lot of internal work is done by the framework to make them work in a similar way to value types i'm half thinking that this will just create a copy of each the string into the dictionary and just double the amount of memory used rather than just pass a reference to the string stored in the dictionary (which is what my colleague is assuming).
So taking into account that this will be run on a massive set of string data...
Is this going to save any memory, assuming that 30% of the string values will be used twice or more?
Is the assumption that this will even work correct?
This is essentially what string interning is, except you don't have to worry how it works. In your example you are still creating a string, then comparing it, then leaving the copy to be disposed of. .NET will do this for you in runtime.
See also String.Intern
and Optimizing C# String Performance (C Calvert)
If a new string is created with code like (
String goober1 = "foo"; String goober2 = "foo";
) shown in lines 18 and 19, then the intern table is checked. If your string is already in there, then both variables will point at the same block of memory maintained by the intern table.
So, you don't have to roll your own - it won't really provide any advantage. EDIT UNLESS: your strings don't usually live for as long as your AppDomain - interned strings live for the lifetime of the AppDomain, which is not necessarily great for GC. If you want short lived strings, then you want a pool. From String.Intern
:
If you are trying to reduce the total amount of memory your application allocates, keep in mind that interning a string has two unwanted side effects. First, the memory allocated for interned String objects is not likely be released until the common language runtime (CLR) terminates. The reason is that the CLR's reference to the interned String object can persist after your application, or even your application domain, terminates. ...
EDIT 2 Also see Jon Skeets SO answer here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With