I am using Python 2.7.5 @ Mac OS X 10.9.3 with 8GB memory and 1.7GHz Core i5. I have tested time consumption as below. <pre class="prettyprint"><code>d = {i:i*2 for i in xrange(10**7*3)} #WARNING: it takes time and consumes a lot of RAM %time for k in d: k,d[k] CPU times: user 6.22 s, sys: 10.1 ms, total: 6.23 s Wall time: 6.23 s %time for k,v in d.iteritems(): k, v CPU times: user 7.67 s, sys: 27.1 ms, total: 7.7 s Wall time: 7.69 s </code></pre> It seems iteritems is slower. I am wondering what is the advantage of iteritems over directly accessing the dict. Update: for a more accuracy time profile <pre class="prettyprint"><code>In [23]: %timeit -n 5 for k in d: v=d[k] 5 loops, best of 3: 2.32 s per loop In [24]: %timeit -n 5 for k,v in d.iteritems(): v 5 loops, best of 3: 2.33 s per loop </code></pre>

To answer your question we should first dig some information about how and when <code>iteritems()</code> was added to the API. The <code>iteritems()</code> method was added in Python2.2 following the introduction of iterators and generators in the language (see also: What is the difference between dict.items() and dict.iteritems()?). In fact the method is explicitly mentioned in PEP 234. So it was introduced as a lazy alternative to the already present <code>items()</code>. This followed the same pattern as <code>file.xreadlines()</code> versus <code>file.readlines()</code> which was introduced in Python 2.1 (and already deprecated in python2.3 by the way). In python 2.3 the <code>itertools</code> module was added which introduced lazy counterparts to <code>map</code>, <code>filter</code> etc. In other words, at the time there was (and still there is) a strong trend towards lazyness of operations. One of the reasons is to improve memory efficiency. An other one is to avoid unneeded computation. I cannot find any reference that says that it was introduced to improve the speed of looping over the dictionary. It was simply used to replace calls to <code>items()</code> that didn't actually have to return a list. Note that this include more use-cases than just a simple <code>for</code> loop. For example in the code: <pre class="prettyprint"><code>function(dictionary.iteritems()) </code></pre> you cannot simply use a <code>for</code> loop to replace <code>iteritems()</code> as in your example. You'd have to write a function (or use a genexp, even though they weren't available when <code>iteritems()</code> was introduced, and they wouldn't be DRY...). Retrieving the items from a <code>dict</code> is done pretty often so it does make sense to provide a built-in method and, in fact, there was one: <code>items()</code>. The problem with <code>items()</code> is that: <ul> <li>it isn't lazy, meaning that calling it on a big <code>dict</code> can take quite some time</li> <li>it takes a lot of memory. It can almost double the memory usage of a program if called on a very big <code>dict</code> that contains most objects being manipulated</li> <li>Most of the time it is iterated only once</li> </ul> So, when introducing iterators and generators, it was obvious to just add a lazy counterpart. If you need a list of items because you want to index it or iterate more than once, use <code>items()</code>, otherwise you can just use <code>iteritems()</code> and avoid the problems cited above. The advantages of using <code>iteritems()</code> are the same as using <code>items()</code> versus manually getting the value: <ul> <li>You write less code, which makes it more DRY and reduces the chances of errors</li> <li>Code is more readable.</li> </ul> Plus the advantages of lazyness. <hr> As I already stated I cannot reproduce your performance results. On my machine <code>iteritems()</code> is always faster than iterating + looking up by key. The difference is quite negligible anyway, and it's probably due to how the OS is handling caching and memory in general. In otherwords your argument about efficiency isn't a strong argument against (nor pro) using one or the other alternative. Given equal performances on average, use the most readable and concise alternative: <code>iteritems()</code>. This discussion would be similar to asking "why use a foreach when you can just loop by index with the same performance?". The importance of foreach isn't in the fact that you iterate faster but that you avoid writing boiler-plate code and improve readability. <hr> I'd like to point out that <code>iteritems()</code> was in fact removed in python3. This was part of the "cleanup" of this version. Python3 <code>items()</code> method id (mostly) equivalent to Python2's <code>viewitems()</code> method (actually a backport if I'm not mistaken...). This version is lazy (and thus provides a replacement for <code>iteritems()</code>) and has also further functionality, such as providing "set-like" operations (such as finding common items between <code>dict</code>s in an efficient way etc.) So in python3 the reasons to use <code>items()</code> instead of manually retrieving the values are even more compelling.

Using <code>for k,v in d.iteritems()</code> with more descriptive names can make the code in the loop suite easier to read.

What is the advantage of iteritems?

Tags:

I am using Python 2.7.5 @ Mac OS X 10.9.3 with 8GB memory and 1.7GHz Core i5. I have tested time consumption as below.

d = {i:i*2 for i in xrange(10**7*3)} #WARNING: it takes time and consumes a lot of RAM  %time for k in d: k,d[k] CPU times: user 6.22 s, sys: 10.1 ms, total: 6.23 s Wall time: 6.23 s  %time for k,v in d.iteritems(): k, v CPU times: user 7.67 s, sys: 27.1 ms, total: 7.7 s Wall time: 7.69 s

It seems iteritems is slower. I am wondering what is the advantage of iteritems over directly accessing the dict.

Update: for a more accuracy time profile

In [23]: %timeit -n 5 for k in d: v=d[k] 5 loops, best of 3: 2.32 s per loop  In [24]: %timeit -n 5 for k,v in d.iteritems(): v 5 loops, best of 3: 2.33 s per loop

826

asked Jun 12 '14 16:06

czheo

2 Answers

To answer your question we should first dig some information about how and when iteritems() was added to the API.

The iteritems() method was added in Python2.2 following the introduction of iterators and generators in the language (see also: What is the difference between dict.items() and dict.iteritems()?). In fact the method is explicitly mentioned in PEP 234. So it was introduced as a lazy alternative to the already present items().

This followed the same pattern as file.xreadlines() versus file.readlines() which was introduced in Python 2.1 (and already deprecated in python2.3 by the way).

In python 2.3 the itertools module was added which introduced lazy counterparts to map, filter etc.

In other words, at the time there was (and still there is) a strong trend towards lazyness of operations. One of the reasons is to improve memory efficiency. An other one is to avoid unneeded computation.

I cannot find any reference that says that it was introduced to improve the speed of looping over the dictionary. It was simply used to replace calls to items() that didn't actually have to return a list. Note that this include more use-cases than just a simple for loop.

For example in the code:

function(dictionary.iteritems())

you cannot simply use a for loop to replace iteritems() as in your example. You'd have to write a function (or use a genexp, even though they weren't available when iteritems() was introduced, and they wouldn't be DRY...).

Retrieving the items from a dict is done pretty often so it does make sense to provide a built-in method and, in fact, there was one: items(). The problem with items() is that:

it isn't lazy, meaning that calling it on a big dict can take quite some time
it takes a lot of memory. It can almost double the memory usage of a program if called on a very big dict that contains most objects being manipulated
Most of the time it is iterated only once

So, when introducing iterators and generators, it was obvious to just add a lazy counterpart. If you need a list of items because you want to index it or iterate more than once, use items(), otherwise you can just use iteritems() and avoid the problems cited above.

The advantages of using iteritems() are the same as using items() versus manually getting the value:

You write less code, which makes it more DRY and reduces the chances of errors
Code is more readable.

Plus the advantages of lazyness.

As I already stated I cannot reproduce your performance results. On my machine iteritems() is always faster than iterating + looking up by key. The difference is quite negligible anyway, and it's probably due to how the OS is handling caching and memory in general. In otherwords your argument about efficiency isn't a strong argument against (nor pro) using one or the other alternative.

Given equal performances on average, use the most readable and concise alternative: iteritems(). This discussion would be similar to asking "why use a foreach when you can just loop by index with the same performance?". The importance of foreach isn't in the fact that you iterate faster but that you avoid writing boiler-plate code and improve readability.

I'd like to point out that iteritems() was in fact removed in python3. This was part of the "cleanup" of this version. Python3 items() method id (mostly) equivalent to Python2's viewitems() method (actually a backport if I'm not mistaken...).

This version is lazy (and thus provides a replacement for iteritems()) and has also further functionality, such as providing "set-like" operations (such as finding common items between dicts in an efficient way etc.) So in python3 the reasons to use items() instead of manually retrieving the values are even more compelling.

156

answered Sep 21 '22 20:09

Bakuriu

Using for k,v in d.iteritems() with more descriptive names can make the code in the loop suite easier to read.

answered Sep 22 '22 20:09

wwii

Related questions
                            
                                How to convert a "raw" string into a normal string?
                            
                                comma separated string to list in r
                            
                                Rx Observable emitting values periodically
                            
                                how to specify database name in spring data mongoDB
                            
                                How to check if two files have the same content?
                            
                                Sync two folders using batch file
                            
                                Pandas SettingWithCopyWarning [duplicate]
                            
                                Initialize a module when it's required
                            
                                Silencing errors on failures for npm run-script
                            
                                Ember CLI: where to reopen framework classes
                            
                                Extracting a subset of attributes with JSONPath
                            
                                Laravel Eloquent skip n, take all?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With