I'm creating this array:
A=itertools.combinations(range(6),2)
and I have to manipulate this array with numpy, like:
A.reshape(..
If the dimensions is A is high, the command list(A)
is too slow.
Update 1: I've tried the solution of hpaulj, in this specific situation is a little bit slower, any idea?
start=time.clock()
A=it.combinations(range(495),3)
A=np.array(list(A))
print A
stop=time.clock()
print stop-start
start=time.clock()
A=np.fromiter(it.chain(*it.combinations(range(495),3)),dtype=int).reshape (-1,3)
print A
stop=time.clock()
print stop-start
Results:
[[ 0 1 2]
[ 0 1 3]
[ 0 1 4]
...,
[491 492 494]
[491 493 494]
[492 493 494]]
10.323822
[[ 0 1 2]
[ 0 1 3]
[ 0 1 4]
...,
[491 492 494]
[491 493 494]
[492 493 494]]
12.289898
Get NumPy Array Combinations With the itertools. product() Function in Python. The itertools package provides many functions related to combination and permutation. We can use the itertools.
What is NumPy? NumPy is a Python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform, and matrices. NumPy was created in 2005 by Travis Oliphant.
The word NumPy stands for Numerical Python. NumPy offers an array object called ndarray. They are similar to standard python sequences but differ in certain key factors.
NumPy offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more. NumPy supports a wide range of hardware and computing platforms, and plays well with distributed, GPU, and sparse array libraries.
I'm reopening this because I dislike the linked answer. The accepted answer suggests using
np.array(list(A)) # producing a (15,2) array
But the OP aparently has already tried list(A)
, and found it to be slow.
Another answer suggests using np.fromiter
. But buried in its comments is the note that fromiter
requires a 1d array.
In [102]: A=itertools.combinations(range(6),2)
In [103]: np.fromiter(A,dtype=int)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-103-29db40e69c08> in <module>()
----> 1 np.fromiter(A,dtype=int)
ValueError: setting an array element with a sequence.
So using fromiter
with this itertools requires somehow flattening the iterator.
A quick set of timings suggests that list
isn't the slow step. It's converting the list to an array that is slow:
In [104]: timeit itertools.combinations(range(6),2)
1000000 loops, best of 3: 1.1 µs per loop
In [105]: timeit list(itertools.combinations(range(6),2))
100000 loops, best of 3: 3.1 µs per loop
In [106]: timeit np.array(list(itertools.combinations(range(6),2)))
100000 loops, best of 3: 14.7 µs per loop
I think the fastest way to use fromiter
is to flatten the combinations
with an idiomatic use of itertools.chain
:
In [112]: timeit
np.fromiter(itertools.chain(*itertools.combinations(range(6),2)),dtype=int)
.reshape(-1,2)
100000 loops, best of 3: 12.1 µs per loop
Not much of a time savings, at least on this small size. (fromiter
also takes a count
, which shaves off another µs. With a larger case, range(60)
, the fromiter
takes half the time of array
.
A quick search on [numpy] itertools
turns up a number of suggestions of pure numpy ways of generating all combinations. itertools
is fast, for generating pure Python structures, but converting those to arrays is a slow step.
A picky point about the question.
A
is a generator, not an array. list(A)
does produce a nested list, that can be described loosely as an array. But it isn't a np.array
, and does not have a reshape
method.
An alternative way to get every pairwise combination of N
elements is to generate the indices of the upper triangle of an (N, N)
matrix using np.triu_indices(N, k=1)
, e.g.:
np.vstack(np.triu_indices(6, k=1)).T
For small arrays, itertools.combinations
is going to win, but for large N the triu_indices
trick can be substantially quicker:
In [1]: %timeit np.fromiter(itertools.chain.from_iterable(itertools.combinations(range(6), 2)), np.int)
The slowest run took 10.46 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 4.04 µs per loop
In [2]: %timeit np.array(np.triu_indices(6, 1)).T
The slowest run took 10.97 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 22.3 µs per loop
In [3]: %timeit np.fromiter(itertools.chain.from_iterable(itertools.combinations(range(1000), 2)), np.int)
10 loops, best of 3: 69.7 ms per loop
In [4]: %timeit np.array(np.triu_indices(1000, 1)).T
100 loops, best of 3: 10.6 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With