Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to turn Numpy array to set efficiently?

Tags:

python

set

numpy

I used:

df['ids'] = df['ids'].values.astype(set)

to turn lists into sets, but the output was a list not a set:

>>> x = np.array([[1, 2, 2.5],[12,35,12]])

>>> x.astype(set)
array([[1.0, 2.0, 2.5],
       [12.0, 35.0, 12.0]], dtype=object)

Is there an efficient way to turn list into set in Numpy?

EDIT 1:
My input is as big as below:
I have 3,000 records. Each has 30,000 ids: [[1,...,12,13,...,30000], [1,..,43,45,...,30000],...,[...]]

like image 691
Alireza Avatar asked Oct 18 '15 08:10

Alireza


People also ask

How can I make NumPy array faster?

The key to making it fast is to use vectorized operations, generally implemented through NumPy's universal functions (ufuncs). This section motivates the need for NumPy's ufuncs, which can be used to make repeated calculations on array elements much more efficient.

Is NumPy array memory efficient?

NumPy uses much less memory to store data The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.

Is appending to NumPy array efficient?

Appending to numpy arrays is very inefficient. This is because the interpreter needs to find and assign memory for the entire array at every single step. Depending on the application, there are much better strategies. If you know the length in advance, it is best to pre-allocate the array using a function like np.

How do you normalize an array in NumPy in Python?

To normalize a 2D-Array or matrix we need NumPy library. For matrix, general normalization is using The Euclidean norm or Frobenius norm. Here, v is the matrix and |v| is the determinant or also called The Euclidean norm. v-cap is the normalized matrix.

Are NumPy arrays more memory efficient than lists?

NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.


2 Answers

First flatten your ndarray to obtain a single dimensional array, then apply set() on it:

set(x.flatten())

Edit : since it seems you just want an array of set, not a set of the whole array, then you can do value = [set(v) for v in x] to obtain a list of sets.

like image 200
P. Camilleri Avatar answered Sep 22 '22 19:09

P. Camilleri


A couple of earlier 'row-wise' unique questions:

vectorize numpy unique for subarrays

Numpy: Row Wise Unique elements

Count unique elements row wise in an ndarray

In a couple of these the count is more interesting than the actual unique values.

If the number of unique values per row differs, then the result cannot be a (2d) array. That's a pretty good indication that the problem cannot be fully vectorized. You need some sort of iteration over the rows.

like image 38
hpaulj Avatar answered Sep 22 '22 19:09

hpaulj