It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations. Thus the permutations of the list <code>[1, 1, 2]</code> are usually considered to be <code>[1, 1, 2], [1, 2, 1], [2, 1, 1]</code>. Indeed, the following C++ code prints precisely those three: <pre class="prettyprint lang-cpp prettyprint-override"><code>int a[] = {1, 1, 2}; do { cout<<a[0]<<" "<<a[1]<<" "<<a[2]<<endl; } while(next_permutation(a,a+3)); </code></pre> On the other hand, Python's <code>itertools.permutations</code> seems to print something else: <pre class="prettyprint lang-py prettyprint-override"><code>import itertools for a in itertools.permutations([1, 1, 2]): print a </code></pre> This prints <pre class="prettyprint"><code>(1, 1, 2) (1, 2, 1) (1, 1, 2) (1, 2, 1) (2, 1, 1) (2, 1, 1) </code></pre> As user Artsiom Rudzenka pointed out in an answer, the Python documentation says so: <blockquote> Elements are treated as unique based on their position, not on their value. </blockquote> My question: why was this design decision made? It seems that following the usual convention would give results that are more useful (and indeed it is usually exactly what I want)... or is there some application of Python's behaviour that I'm missing? [Or is it some implementation issue? The algorithm as in <code>next_permutation</code> — for instance explained on StackOverflow here (by me) and shown here to be O(1) amortised — seems efficient and implementable in Python, but is Python doing something even more efficient since it doesn't guarantee lexicographic order based on value? And if so, was the increase in efficiency considered worth it?]

I can't speak for the designer of <code>itertools.permutations</code> (Raymond Hettinger), but it seems to me that there are a couple of points in favour of the design: First, if you used a <code>next_permutation</code>-style approach, then you'd be restricted to passing in objects that support a linear ordering. Whereas <code>itertools.permutations</code> provides permutations of any kind of object. Imagine how annoying this would be: <pre class="prettyprint"><code>>>> list(itertools.permutations([1+2j, 1-2j, 2+j, 2-j])) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: no ordering relation is defined for complex numbers </code></pre> Second, by not testing for equality on objects, <code>itertools.permutations</code> avoids paying the cost of calling the <code>__eq__</code> method in the usual case where it's not necessary. Basically, <code>itertools.permutations</code> solves the common case reliably and cheaply. There's certainly an argument to be made that <code>itertools</code> ought to provide a function that avoids duplicate permutations, but such a function should be in addition to <code>itertools.permutations</code>, not instead of it. Why not write such a function and submit a patch?

I'm accepting the answer of Gareth Rees as the most appealing explanation (short of an answer from the Python library designers), namely, that Python's <code>itertools.permutations</code> doesn't compare the values of the elements. Come to think of it, this is what the question asks about, but I see now how it could be seen as an advantage, depending on what one typically uses <code>itertools.permutations</code> for. Just for completeness, I compared three methods of generating all distinct permutations. Method 1, which is very inefficient memory-wise and time-wise but requires the least new code, is to wrap Python's <code>itertools.permutations</code>, as in zeekay's answer. Method 2 is a generator-based version of C++'s <code>next_permutation</code>, from this blog post. Method 3 is something I wrote that is even closer to C++'s <code>next_permutation</code> algorithm; it modifies the list in-place (I haven't made it too general). <pre class="prettyprint lang-py prettyprint-override"><code>def next_permutationS(l): n = len(l) #Step 1: Find tail last = n-1 #tail is from `last` to end while last>0: if l[last-1] < l[last]: break last -= 1 #Step 2: Increase the number just before tail if last>0: small = l[last-1] big = n-1 while l[big] <= small: big -= 1 l[last-1], l[big] = l[big], small #Step 3: Reverse tail i = last j = n-1 while i < j: l[i], l[j] = l[j], l[i] i += 1 j -= 1 return last>0 </code></pre> Here are some results. I have even more respect for Python's built-in function now: it's about three to four times as fast as the other methods when the elements are all (or almost all) distinct. Of course, when there are many repeated elements, using it is a terrible idea. <pre class="prettyprint"><code>Some results ("us" means microseconds): l m_itertoolsp m_nextperm_b m_nextperm_s [1, 1, 2] 5.98 us 12.3 us 7.54 us [1, 2, 3, 4, 5, 6] 0.63 ms 2.69 ms 1.77 ms [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 6.93 s 13.68 s 8.75 s [1, 2, 3, 4, 6, 6, 6] 3.12 ms 3.34 ms 2.19 ms [1, 2, 2, 2, 2, 3, 3, 3, 3, 3] 2400 ms 5.87 ms 3.63 ms [1, 1, 1, 1, 1, 1, 1, 1, 1, 2] 2320000 us 89.9 us 51.5 us [1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4] 429000 ms 361 ms 228 ms </code></pre> The code is here if anyone wants to explore.

Why does Python's itertools.permutations contain duplicates? (When the original list has duplicates)

Tags:

python

algorithm

language-design

permutation

It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations. Thus the permutations of the list [1, 1, 2] are usually considered to be
[1, 1, 2], [1, 2, 1], [2, 1, 1]. Indeed, the following C++ code prints precisely those three:

int a[] = {1, 1, 2}; do {     cout<<a[0]<<" "<<a[1]<<" "<<a[2]<<endl; } while(next_permutation(a,a+3));

On the other hand, Python's itertools.permutations seems to print something else:

import itertools for a in itertools.permutations([1, 1, 2]):     print a

This prints

(1, 1, 2) (1, 2, 1) (1, 1, 2) (1, 2, 1) (2, 1, 1) (2, 1, 1)

As user Artsiom Rudzenka pointed out in an answer, the Python documentation says so:

Elements are treated as unique based on their position, not on their value.

My question: why was this design decision made?

It seems that following the usual convention would give results that are more useful (and indeed it is usually exactly what I want)... or is there some application of Python's behaviour that I'm missing?

[Or is it some implementation issue? The algorithm as in next_permutation — for instance explained on StackOverflow here (by me) and shown here to be O(1) amortised — seems efficient and implementable in Python, but is Python doing something even more efficient since it doesn't guarantee lexicographic order based on value? And if so, was the increase in efficiency considered worth it?]

486

asked Jun 30 '11 12:06

ShreevatsaR

2 Answers

I can't speak for the designer of itertools.permutations (Raymond Hettinger), but it seems to me that there are a couple of points in favour of the design:

First, if you used a next_permutation-style approach, then you'd be restricted to passing in objects that support a linear ordering. Whereas itertools.permutations provides permutations of any kind of object. Imagine how annoying this would be:

>>> list(itertools.permutations([1+2j, 1-2j, 2+j, 2-j])) Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: no ordering relation is defined for complex numbers

Second, by not testing for equality on objects, itertools.permutations avoids paying the cost of calling the __eq__ method in the usual case where it's not necessary.

Basically, itertools.permutations solves the common case reliably and cheaply. There's certainly an argument to be made that itertools ought to provide a function that avoids duplicate permutations, but such a function should be in addition to itertools.permutations, not instead of it. Why not write such a function and submit a patch?

answered Oct 01 '22 23:10

Gareth Rees

I'm accepting the answer of Gareth Rees as the most appealing explanation (short of an answer from the Python library designers), namely, that Python's itertools.permutations doesn't compare the values of the elements. Come to think of it, this is what the question asks about, but I see now how it could be seen as an advantage, depending on what one typically uses itertools.permutations for.

Just for completeness, I compared three methods of generating all distinct permutations. Method 1, which is very inefficient memory-wise and time-wise but requires the least new code, is to wrap Python's itertools.permutations, as in zeekay's answer. Method 2 is a generator-based version of C++'s next_permutation, from this blog post. Method 3 is something I wrote that is even closer to C++'s next_permutation algorithm; it modifies the list in-place (I haven't made it too general).

def next_permutationS(l):     n = len(l)     #Step 1: Find tail     last = n-1 #tail is from `last` to end     while last>0:         if l[last-1] < l[last]: break         last -= 1     #Step 2: Increase the number just before tail     if last>0:         small = l[last-1]         big = n-1         while l[big] <= small: big -= 1         l[last-1], l[big] = l[big], small     #Step 3: Reverse tail     i = last     j = n-1     while i < j:         l[i], l[j] = l[j], l[i]         i += 1         j -= 1     return last>0

Here are some results. I have even more respect for Python's built-in function now: it's about three to four times as fast as the other methods when the elements are all (or almost all) distinct. Of course, when there are many repeated elements, using it is a terrible idea.

Some results ("us" means microseconds):  l                                       m_itertoolsp  m_nextperm_b  m_nextperm_s [1, 1, 2]                               5.98 us       12.3 us       7.54 us [1, 2, 3, 4, 5, 6]                      0.63 ms       2.69 ms       1.77 ms [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]         6.93 s        13.68 s       8.75 s  [1, 2, 3, 4, 6, 6, 6]                   3.12 ms       3.34 ms       2.19 ms [1, 2, 2, 2, 2, 3, 3, 3, 3, 3]          2400 ms       5.87 ms       3.63 ms [1, 1, 1, 1, 1, 1, 1, 1, 1, 2]          2320000 us    89.9 us       51.5 us [1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4]    429000 ms     361 ms        228 ms

The code is here if anyone wants to explore.

answered Oct 01 '22 22:10

ShreevatsaR

Related questions
                            
                                Install a package and write to requirements.txt with pip
                            
                                Green-threads and thread in Python
                            
                                Scipy sparse... arrays?
                            
                                Pros and cons for different configuration formats?
                            
                                Django on IronPython
                            
                                unzipping file results in "BadZipFile: File is not a zip file"
                            
                                matplotlib backends - do I care?
                            
                                Returning API Error Messages with Python and Flask
                            
                                How to limit the heap size?
                            
                                Tool to determine what lowest version of Python required?
                            
                                Why is linear read-shuffled write not faster than shuffled read-linear write?
                            
                                pytest using fixtures as arguments in parametrize
                            
                                How to implement custom indentation when pretty-printing with the JSON module?
                            
                                Python Requests: Post JSON and file in single request
                            
                                Why does PyCharm use 120 Character Lines even though PEP8 Specifies 79?
                            
                                Why does python use two underscores for certain things?
                            
                                Using Sql Server with Django in production
                            
                                Argparse"ArgumentError: argument -h/--help: conflicting option string(s): -h, --help"
                            
                                How to get scalar value on a cell using conditional indexing
                            
                                How do I add python libraries to an AWS lambda function for Alexa?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does Python's itertools.permutations contain duplicates? (When the original list has duplicates)

Tags:

python

algorithm

language-design

permutation

ShreevatsaR

People also ask

2 Answers

Gareth Rees

ShreevatsaR

Recent Activity

Donate For Us