Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Split NumPy array based on values in the array

Tags:

I have one big array:

[(1.0, 3.0, 1, 427338.4297000002, 4848489.4332)
 (1.0, 3.0, 2, 427344.7937000003, 4848482.0692)
 (1.0, 3.0, 3, 427346.4297000002, 4848472.7469) ...,
 (1.0, 1.0, 7084, 427345.2709999997, 4848796.592)
 (1.0, 1.0, 7085, 427352.9277999997, 4848790.9351)
 (1.0, 1.0, 7086, 427359.16060000006, 4848787.4332)]

I want to split this array into multiple arrays based on the 2nd value in the array (3.0, 3.0, 3.0...1.0,1.0,10).

Every time the 2nd value changes, I want a new array, so basically each new array has the same 2nd value. I've looked this up on Stack Overflow and know of the command

np.split(array, number)

but I'm not trying to split the array into a certain number of arrays, but rather by a value. How would I be able to split the array in the way specified above? Any help would be appreciated!

like image 854
whent1991 Avatar asked Aug 06 '15 18:08

whent1991


People also ask

How can we divide the one NumPy array into two different array?

Splitting NumPy Arrays Splitting is reverse operation of Joining. Joining merges multiple arrays into one and Splitting breaks one array into multiple. We use array_split() for splitting arrays, we pass it the array we want to split and the number of splits.

Can you divide an array by an array in Python?

divide() is a numpy library function used to perform division amongst the elements of the first array by the elements of the second array. The process of division occurs element-wise between the two arrays. The numpy divide() function takes two arrays as arguments and returns the same size as the input array.

How do you separate values from an array in Python?

To split a list into n parts in Python, use the numpy. array_split() function. The np. split() function splits the array into multiple sub-arrays.


Video Answer


1 Answers

You can find the indices where the values differ by using numpy.where and numpy.diff on the first column:

>>> arr = np.array([(1.0, 3.0, 1, 427338.4297000002, 4848489.4332),
 (1.0, 3.0, 2, 427344.7937000003, 4848482.0692),
 (1.0, 3.0, 3, 427346.4297000002, 4848472.7469),
 (1.0, 1.0, 7084, 427345.2709999997, 4848796.592),
 (1.0, 1.0, 7085, 427352.9277999997, 4848790.9351),
 (1.0, 1.0, 7086, 427359.16060000006, 4848787.4332)])
>>> np.split(arr, np.where(np.diff(arr[:,1]))[0]+1)
[array([[  1.00000000e+00,   3.00000000e+00,   1.00000000e+00,
          4.27338430e+05,   4.84848943e+06],
       [  1.00000000e+00,   3.00000000e+00,   2.00000000e+00,
          4.27344794e+05,   4.84848207e+06],
       [  1.00000000e+00,   3.00000000e+00,   3.00000000e+00,
          4.27346430e+05,   4.84847275e+06]]),
 array([[  1.00000000e+00,   1.00000000e+00,   7.08400000e+03,
          4.27345271e+05,   4.84879659e+06],
       [  1.00000000e+00,   1.00000000e+00,   7.08500000e+03,
          4.27352928e+05,   4.84879094e+06],
       [  1.00000000e+00,   1.00000000e+00,   7.08600000e+03,
          4.27359161e+05,   4.84878743e+06]])]

Explanation:

Here first we are going to fetch the items in the second 2 column:

>>> arr[:,1]
array([ 3.,  3.,  3.,  1.,  1.,  1.])

Now to find out where the items actually change we can use numpy.diff:

>>> np.diff(arr[:,1])
array([ 0.,  0., -2.,  0.,  0.])

Any thing non-zero means that the item next to it was different, we can use numpy.where to find the indices of non-zero items and then add 1 to it because the actual index of such item is one more than the returned index:

>>> np.where(np.diff(arr[:,1]))[0]+1
array([3])
like image 137
Ashwini Chaudhary Avatar answered Sep 16 '22 16:09

Ashwini Chaudhary