In some situations involving immutable objects, it will be possible for many distinct objects to come into existence which are semantically identical. A simple example would be reading many lines of text from a file into strings. From the program's perspective, the fact that two lines have the same sequence of characters would be "coincidence", but from the programmer's perspective a large amount of duplication may be expected. If many string instances are identical, changing the references to those distinct instances into references to a single instance will save memory, and will also facilitate comparisons between them (if two string references point to the same string, there's no need to do a character-by-character comparison to determine that they are identical).
For some scenarios, the system-provided string interning facility may be useful. It has, however, a couple of severe limitations:
If there existed a true WeakDictionary<ImmutableClassType, ImmutableClassType>
(for each element, the key and value would be identical), code could do something like:
if (theDict.TryGetValue(myString, ref internedString))
myString = internedString;
else
theDict[myString] = myString;
Unfortunately, I am unaware of any built-in WeakDictionary<keyType, valType>
class in .net. Further, it would seem wasteful to generate a weak reference for each item's key and value, when both references would always point to the same thing.
I've read some about ConditionalWeakTable
, and that sounds like an interesting class, but I don't think it would be usable here, since the goal is to be able to take one instance and find another independent instance which is semantically equivalent.
For situations where instances of a class will have a well-defined lifetime, it may be reasonable to use a conventional Dictionary
to merge identical instances. In many cases, however, it may be difficult to know when such a Dictionary
should be abandoned or the items within it cleaned out. A WeakReference
-based interning collection would avoid such issues. Does such a thing exist, or could it be readily implemented?
Addendum
As svick noted, a Dictionary<WeakReference, WeakReference>
would be somewhat problematical as there would be no practical way to define an IEqualityComparer
which would have a live WeakReference
return the GetHashCode
value of its target, and have a dead one continue to return that value. One could define a struct which would contain an integer target-hashcode value (set in the constructor), and whose own GetHashCode
would return that integer. A slight improvement might be to use a ConditionalWeakTable
to link the target of the WeakReference
to a finalizable object which could be used to enqueue table items for deletion.
I'm not sure what the proper balance is between trying to eagerly cleaning out the dictionary, versus taking a somewhat more passive approach (e.g. perform a sweep when adding an item if there's been at least one GC since the last sweep, and the number of items added since the last sweep exceeds the number of items that survived it). Sweeping through everything in the dictionary isn't going to be free, but ConditionalWeakTable probably isn't going to be free either.
PPS
Another notion I was thinking of, but I figured it probably wouldn't be as useful as a weak-interning approach, would be to have a logically-immutable type hold a mutable "timestamp" value, and have a comparison method which accepts its arguments by ref
. If two different instances are found to be equal, their timestamp values would be examined. If both zero, they would be assigned consecutive negative numbers from a global counter (-1, -2, -3, etc.). The parameter which had (or was assigned) the lower timestamp value would then be replaced by the other.
Using such an approach, if many object instances were repeatedly compared against each other, many of the references would be replaced with references to "older" instances. Depending upon usage patterns, this may result in most of the identical object instances being merged without the use of any sort of interning dictionary. Applying such an approach with nested objects, however, would require that "immutable" objects allow nested-object references to be mutated to point to other supposedly-identical nested objects. This should be fine if "supposedly-identical" objects always are, but could cause rather bizarre misbehavior if not.
The only real disadvantage of immutable classes is that they require a separate object for each distinct value. Creating these objects can be costly, especially if they are large.
The common use case for the immutable methods is a collection that is initialized from known values, and that never changes. Also consider using these methods if your data changes infrequently. For optimal performance, the immutable collections store a data set that never changes.
Immutable objects can be useful in multi-threaded applications. Multiple threads can act on data represented by immutable objects without concern of the data being changed by other threads. Immutable objects are therefore considered more thread-safe than mutable objects.
An immutable object cannot be garbage collected. String is immutable.
You could create something like Dictionary<WeakReference, WeakReference>
with a custom equality comparer and prune those that are no longer alive at appropriate times. One problem with that is how to remove a dead WeakRefrence
from the dictionary, because you can't remove it by reference equality (remember, you have to use custom equality comparer) or index. Possible solutions are creating a type that inherits from WeakReference
and has correct hash code even if the reference is dead. Or you could wrap it in a custom struct
.
But as you said, it would be nice if each reference was removed from the dictionary immediately after it died. I think the only way to do this is to somehow use finalizers. But if you don't want to (or can't) modify the type in the dictionary, this will get quite complicated.
The basic idea is that you will have the same dictionary of weak references as above (the caveats about how to remove items still apply), but you also attach a helper object with a finalizer to each item in the dictionary using ConditionalWeakTable
. And in that finalizer, you will remove the item from the dictionary. Because of how ConditionalWeakTable
works, if an item in the dictionary gets GCed, the attached object will too, which means its finalizer will be called and so the item will be removed from the dictionary
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With