Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting and transforming data in numpy

Suppose I have the following numpy vector

[[1, 3., 'John Doe', 'male', 'doc', '25'],
  ...,
 [9, 6., 'Jane Doe', 'female', 'p', '28']]

I need to extract relevant to my task data.

Being a novice in numpy and python in general, I would do it in the following manner:

data = np.array(
[[1, 3., 'John Doe', 'male', 'doc', 25],
 [9, 6., 'Jane Doe', 'female', 'p', 28]]
)

data_tr = np.zeros((data.shape[0], 3))
for i in range(0, data.shape[0]):
    data_tr[i][0] = data[i, 1]
    data_tr[i][1] = 0 if data[i, 3] == 'male' else 1
    data_tr[i][2] = data[i, 5]

And as a result I have the following:

[[  3.,   0.,  25.],
 [  6.,   1.,  28.]]

What I would like to know is if there is a more efficient or cleaner way to perform that.
Can anybody please help me with that?

like image 963
Dmitry Volkov Avatar asked Sep 17 '17 17:09

Dmitry Volkov


People also ask

Is NumPy used for data manipulation?

Pandas is most commonly used for data wrangling and data manipulation purposes, and NumPy objects are primarily used to create arrays or matrices that can be applied to DL or ML models.

How do I extract an element from a NumPy array?

Using the logical_and() method The logical_and() method from the numpy package accepts multiple conditions or expressions as a parameter. Each of the conditions or the expressions should return a boolean value. These boolean values are used to extract the required elements from the array.

How is NumPy used in data analysis?

NumPy (short for Numerical Python) provides an efficient interface to store and operate on dense data buffers. In some ways, NumPy arrays are like Python's built-in list type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size.


1 Answers

One approach with column-indexing -

data_tr = np.zeros((data.shape[0], 3))
data_tr[:,[0,2]] = data[:, [1,5]]
data_tr[:,1] = data[:,3]=='male'

Note that the step : data_tr[:,[0,2]] = data[:, [1,5]] is working with copies off the respective columns. Those are not very efficient for assignments and extraction. So, you might want to do that in two separate steps, mostly for performance, like so -

data_tr[:,0] = data[:, 1]
data_tr[:,2] = data[:, 5]
like image 73
Divakar Avatar answered Oct 21 '22 19:10

Divakar