Why are mutable strings slower than immutable strings?
EDIT:
>>> import UserString
... def test():
... s = UserString.MutableString('Python')
... for i in range(3):
... s[0] = 'a'
...
... if __name__=='__main__':
... from timeit import Timer
... t = Timer("test()", "from __main__ import test")
... print t.timeit()
13.5236170292
>>> import UserString
... def test():
... s = UserString.MutableString('Python')
... s = 'abcd'
... for i in range(3):
... s = 'a' + s[1:]
...
... if __name__=='__main__':
... from timeit import Timer
... t = Timer("test()", "from __main__ import test")
... print t.timeit()
6.24725079536
>>> import UserString
... def test():
... s = UserString.MutableString('Python')
... for i in range(3):
... s = 'a' + s[1:]
...
... if __name__=='__main__':
... from timeit import Timer
... t = Timer("test()", "from __main__ import test")
... print t.timeit()
38.6385951042
i think it is obvious why i put s = UserString.MutableString('Python') on second test.
In a hypothetical language that offers both mutable and immutable, otherwise equivalent, string types (I can't really think of one offhand -- e.g., Python and Java both have immutable strings only, and other ways to make one through mutation which add indirectness and therefore can of course slow things down a bit;-), there's no real reason for any performance difference -- for example, in C++, interchangeably using a std::string
or a const std::string
I would expect to cause no performance difference (admittedly a compiler might be able to optimize code using the latter better by counting on the immutability, but I don't know any real-world ones that do perform such theoretically possible optimizations;-).
Having immutable strings may and does in fact allow very substantial optimizations in Java and Python. For example, if the strings get hashed, the hash can be cached, and will never have to be recomputed (since the string can't change) -- that's especially important in Python, which uses hashed strings (for look-ups in sets and dictionaries) so lavishly and even "behind the scenes". Fresh copies never need to be made "just in case" the previous one has changed in the meantime -- references to a single copy can always be handed out systematically whenever that string is required. Python also copiously uses "interning" of (some) strings, potentially allowing constant-time comparisons and many other similarly fast operations -- think of it as one more way, a more advanced one to be sure, to take advantage of strings' immutability to cache more of the results of operations often performed on them.
That's not to say that a given compiler is going to take advantage of all possible optimizations, of course. For example, when a slice of a string is requested, there is no real need to make a new object and copy the data over -- the new slice might refer to the old one with an offset (and an independently stored length), potentially a great optimization for big strings out of which many slices are taken. Python doesn't do that because, unless particular care is taken in memory management, this might easily result in the "big" string being all kept in memory when only a small slice of it is actually needed -- but it's a tradeoff that a different implementation might definitely choose to perform (with that burden of extra memory management, to be sure -- more complex, harder-to-debug compiler and runtime code for the hypothetical language in question).
I'm just scratching the surface here -- and many of these advantages would be hard to keep if otherwise interchangeable string types could exist in both mutable and immutable versions (which I suspect is why, to the best of my current knowledge at least, C++ compilers actually don't bother with such optimizations, despite being generally very performance-conscious). But by offering only immutable strings as the primitive, fundamental data type (and thus implicitly accepting some disadvantage when you'd really need a mutable one;-), languages such as Java and Python can clearly gain all sorts of advantages -- performance issues being only one group of them (Python's choice to allow only immutable primitive types to be hashable, for example, is not a performance-centered design decision -- it's more about clarity and predictability of behavior for sets and dictionaries!-).
I don't know if they are really a lot slower but they make thinking about programming easier a lot of the times, because the state of the object/string can't change. That's the most important property to immutability to me.
Furthermore you might assume that immutable string are faster because they have less state(which can change), which might mean lower memory consumption, CPU-cycles.
I also found this interesting article while googling which I would like to quote:
knowing that a string is immutable makes it easy to lay it out at construction time — fixed and unchanging storage requirements
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With