The docs for itertools.combinations
state:
Combinations are emitted in lexicographic sort order. So, if the input iterable is sorted, the combination tuples will be produced in sorted order.
Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeat values in each combination.
[Emphasis mine]
What is the exact guarantee being made here? An empirical check shows that elements are always emitted as if by
for i in range(len(iterable)):
for j in range(i + 1, len(iterable)):
for k in range(j + 1, len(iterable)):
...
yield iterable[i], iterable[j], iterable[k], ...
What is the meaning of "lexicographixal order" in this case? In particular, I believe that the emphasized sentence is crucial, but I am not 100% of what the connection is. I think it means that the lexicographixal order is applied to the indices of the elements regardless of value, but I'd love to have someone confirm that.
What does itertools. combinations() do ? It returns r length subsequences of elements from the input iterable. Combinations are emitted in lexicographic sort order.
combinations() provides us with all the possible tuples a sequence or set of numbers or letters used in the iterator and the elements are assumed to be unique on the basis of there positions which are distinct for all elements. All these combinations are emitted in lexicographical order.
Talking about the time complexity of the combination function, we can say that to generate valid combinations exactly once, it will take O(n C r), and for getting r combinations, it is O(r). Hence, the total time complexity is O( r * [nCr]).
That being said, the iterators from itertools are often significantly faster than regular iteration from a standard Python for loop.
To translate these paragraphs from computer sciencese into English:
Combinations are emitted in lexicographic sort order. So, if the input iterable is sorted, the combination tuples will be produced in sorted order.
"Lexicographic" here is a mathematical term and does not mean alphabetically, but "in whatever order the lexicon defines".
early 17th century: modern Latin, from Greek lexikon (biblion) ‘(book) of words’, from lexis ‘word’, from legein ‘speak’.
The "lexicon" here being your input. Put plainly, your input defines the order in which the output will be produced. If you want alphabetically sorted output, sort your input.
Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeat values in each combination.
This just says that combinations
will not look at or care about the actual values themselves, it just combines elements by their position. It won't deduplicate based on the values, it deduplicates combinations of positions. If you want unique combinations, deduplicate your input.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With