Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are mutable strings slower than immutable strings?

Tags:

python

Why are mutable strings slower than immutable strings?

EDIT:

>>> import UserString
... def test():
...     s = UserString.MutableString('Python')
...     for i in range(3):
...         s[0] = 'a'
... 
... if __name__=='__main__':
...     from timeit import Timer
...     t = Timer("test()", "from __main__ import test")
...     print t.timeit()
13.5236170292



>>> import UserString
... def test():
...     s = UserString.MutableString('Python')
...     s = 'abcd'
...     for i in range(3):
...         s = 'a' + s[1:]
... 
... if __name__=='__main__':
...     from timeit import Timer
...     t = Timer("test()", "from __main__ import test")
...     print t.timeit()
6.24725079536


>>> import UserString
... def test():
...     s = UserString.MutableString('Python')
...     for i in range(3):
...         s = 'a' + s[1:]
... 
... if __name__=='__main__':
...     from timeit import Timer
...     t = Timer("test()", "from __main__ import test")
...     print t.timeit()
38.6385951042

i think it is obvious why i put s = UserString.MutableString('Python') on second test.

like image 753
user422100 Avatar asked Nov 27 '22 06:11

user422100


2 Answers

In a hypothetical language that offers both mutable and immutable, otherwise equivalent, string types (I can't really think of one offhand -- e.g., Python and Java both have immutable strings only, and other ways to make one through mutation which add indirectness and therefore can of course slow things down a bit;-), there's no real reason for any performance difference -- for example, in C++, interchangeably using a std::string or a const std::string I would expect to cause no performance difference (admittedly a compiler might be able to optimize code using the latter better by counting on the immutability, but I don't know any real-world ones that do perform such theoretically possible optimizations;-).

Having immutable strings may and does in fact allow very substantial optimizations in Java and Python. For example, if the strings get hashed, the hash can be cached, and will never have to be recomputed (since the string can't change) -- that's especially important in Python, which uses hashed strings (for look-ups in sets and dictionaries) so lavishly and even "behind the scenes". Fresh copies never need to be made "just in case" the previous one has changed in the meantime -- references to a single copy can always be handed out systematically whenever that string is required. Python also copiously uses "interning" of (some) strings, potentially allowing constant-time comparisons and many other similarly fast operations -- think of it as one more way, a more advanced one to be sure, to take advantage of strings' immutability to cache more of the results of operations often performed on them.

That's not to say that a given compiler is going to take advantage of all possible optimizations, of course. For example, when a slice of a string is requested, there is no real need to make a new object and copy the data over -- the new slice might refer to the old one with an offset (and an independently stored length), potentially a great optimization for big strings out of which many slices are taken. Python doesn't do that because, unless particular care is taken in memory management, this might easily result in the "big" string being all kept in memory when only a small slice of it is actually needed -- but it's a tradeoff that a different implementation might definitely choose to perform (with that burden of extra memory management, to be sure -- more complex, harder-to-debug compiler and runtime code for the hypothetical language in question).

I'm just scratching the surface here -- and many of these advantages would be hard to keep if otherwise interchangeable string types could exist in both mutable and immutable versions (which I suspect is why, to the best of my current knowledge at least, C++ compilers actually don't bother with such optimizations, despite being generally very performance-conscious). But by offering only immutable strings as the primitive, fundamental data type (and thus implicitly accepting some disadvantage when you'd really need a mutable one;-), languages such as Java and Python can clearly gain all sorts of advantages -- performance issues being only one group of them (Python's choice to allow only immutable primitive types to be hashable, for example, is not a performance-centered design decision -- it's more about clarity and predictability of behavior for sets and dictionaries!-).

like image 129
Alex Martelli Avatar answered Dec 25 '22 01:12

Alex Martelli


I don't know if they are really a lot slower but they make thinking about programming easier a lot of the times, because the state of the object/string can't change. That's the most important property to immutability to me.

Furthermore you might assume that immutable string are faster because they have less state(which can change), which might mean lower memory consumption, CPU-cycles.

I also found this interesting article while googling which I would like to quote:

knowing that a string is immutable makes it easy to lay it out at construction time — fixed and unchanging storage requirements

like image 45
Alfred Avatar answered Dec 25 '22 01:12

Alfred