Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: understanding iterators and `join()` better

The join() function accepts an iterable as parameter. However, I was wondering why having:

text = 'asdfqwer'

This:

''.join([c for c in text])

Is significantly faster than:

''.join(c for c in text)

The same occurs with long strings (i.e. text * 10000000).

Watching the memory footprint of both executions with long strings, I think they both create one and only one list of chars in memory, and then join them into a string. So I am guessing perhaps the difference is only between how join() creates this list out of the generator and how the Python interpreter does the same thing when it sees [c for c in text]. But, again, I am just guessing, so I would like somebody to confirm/deny my guesses.

like image 966
Peque Avatar asked Sep 08 '15 15:09

Peque


1 Answers

The join method reads its input twice; once to determine how much memory to allocate for the resulting string object, then again to perform the actual join. Passing a list is faster than passing a generator object that it needs to make a copy of so that it can iterate over it twice.

A list comprehension is not simply a generator object wrapped in a list, so constructing the list externally is faster than having join create it from a generator object. Generator objects are optimized for memory efficiency, not speed.

Of course, a string is already an iterable object, so you could just write ''.join(text). (Also again this is not as fast as creating the list explicitly from the string.)

like image 160
chepner Avatar answered Oct 16 '22 10:10

chepner