Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy : convert labels into indexes

Is it possible to convert a string vector into an indexed one using numpy ?

Suppose I have an array of strings like ['ABC', 'DEF', 'GHI', 'DEF', 'ABC'] etc. I want it to be changed to an array of integers like [0,1,2,1,0]. Is it possible using numpy? I know that Pandas has a Series class that can do this, courtesy of this answer. Is there something similar for numpy as well?

Edit : np.unique() returns unique value for all elements. What I'm trying to do is convert the labels in the Iris dataset to indices, such as 0 for Iris-setosa, 1 for Iris-versicolor and 2 for Iris-virginica respectively. Is there a way to do this using numpy?

like image 867
srdg Avatar asked May 02 '18 07:05

srdg


People also ask

How do I get the indices of sorted NumPy array?

We can get the indices of the sorted elements of a given array with the help of argsort() method. This function is used to perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as arr that that would sort the array.

Can you index NumPy array?

ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There are different kinds of indexing available depending on obj: basic indexing, advanced indexing and field access.

What does NP argmax () do?

The numpy. argmax() function returns indices of the max element of the array in a particular axis.

What is NumPy Unravel_index?

numpy. unravel_index(indices, shape, order='C') Converts a flat index or array of flat indices into a tuple of coordinate arrays. Parameters indicesarray_like. An integer array whose elements are indices into the flattened version of an array of dimensions shape .


1 Answers

Use numpy.unique with parameter return_inverse=True, but there is difference with handling NaNs - check factorizing values:

L = ['ABC', 'DEF', 'GHI', 'DEF', 'ABC']

print (np.unique(L, return_inverse=True)[1])
[0 1 2 1 0]

pandas factorize working nice with list or array too:

print (pd.factorize(L)[0])
[0 1 2 1 0]
like image 118
jezrael Avatar answered Nov 15 '22 00:11

jezrael