I used:
df['ids'] = df['ids'].values.astype(set)
to turn lists
into sets
, but the output was a list not a set:
>>> x = np.array([[1, 2, 2.5],[12,35,12]])
>>> x.astype(set)
array([[1.0, 2.0, 2.5],
[12.0, 35.0, 12.0]], dtype=object)
Is there an efficient way to turn list into set in Numpy
?
EDIT 1:
My input is as big as below:
I have 3,000 records. Each has 30,000 ids: [[1,...,12,13,...,30000], [1,..,43,45,...,30000],...,[...]]
The key to making it fast is to use vectorized operations, generally implemented through NumPy's universal functions (ufuncs). This section motivates the need for NumPy's ufuncs, which can be used to make repeated calculations on array elements much more efficient.
NumPy uses much less memory to store data The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.
Appending to numpy arrays is very inefficient. This is because the interpreter needs to find and assign memory for the entire array at every single step. Depending on the application, there are much better strategies. If you know the length in advance, it is best to pre-allocate the array using a function like np.
To normalize a 2D-Array or matrix we need NumPy library. For matrix, general normalization is using The Euclidean norm or Frobenius norm. Here, v is the matrix and |v| is the determinant or also called The Euclidean norm. v-cap is the normalized matrix.
NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.
First flatten your ndarray to obtain a single dimensional array, then apply set() on it:
set(x.flatten())
Edit : since it seems you just want an array of set, not a set of the whole array, then you can do value = [set(v) for v in x]
to obtain a list of sets.
A couple of earlier 'row-wise' unique questions:
vectorize numpy unique for subarrays
Numpy: Row Wise Unique elements
Count unique elements row wise in an ndarray
In a couple of these the count is more interesting than the actual unique values.
If the number of unique values per row differs, then the result cannot be a (2d) array. That's a pretty good indication that the problem cannot be fully vectorized. You need some sort of iteration over the rows.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With