Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flatten numpy array but also keep index of value positions?

I have several 2D numpy arrays (matrix) and for each one I would like to convert it to vector containing the values of the array and a vector containing each row/column index.

For example I might have an array like this:

x = np.array([[3, 1, 4],
              [1, 5, 9],
              [2, 6, 5]])

and I basically want the values

[3, 1, 4, 1, 5, 9, 2, 6, 5]

and their position

[[0,0], [0,1], [0,2], [1,0], [1,1], [1,2], [2,0], [2,1], [2,2]]

My end goal is to put these into a pandas DataFrame as columns like this:

V | x | y
--+---+---
3 | 0 | 0
1 | 0 | 1
4 | 0 | 2
1 | 1 | 0
5 | 1 | 1
9 | 1 | 2
6 | 2 | 0
5 | 2 | 1
3 | 2 | 2

where V is the value, x is the row position (index), and y is the column position (index).

I think I can hack something together but I'm trying to find the efficient way of doing this rather than fumbling around. For example I know I can get the values using something like x.reshape(x.size, 1) and that I could try to create the index columns from x.shape, but there seems like there should be a better way.

like image 496
Ellis Valentiner Avatar asked Jun 26 '15 19:06

Ellis Valentiner


2 Answers

You could also let pandas do the work for you since you'll be using it in a dataframe:

x = np.array([[3, 1, 4],
              [1, 5, 9],
              [2, 6, 5]])
df=pd.DataFrame(x)
#unstack the y columns so that they become an index then reset the
#index so that indexes become columns.
df=df.unstack().reset_index()
df

   level_0  level_1  0
0        0        0  3
1        0        1  1
2        0        2  2
3        1        0  1
4        1        1  5
5        1        2  6
6        2        0  4
7        2        1  9
8        2        2  5

#name the columns and switch the column order
df.columns=['x','y','V']
cols = df.columns.tolist()
cols = cols[-1:] + cols[:-1]
df = df[cols]
df

   V  x  y
0  3  0  0
1  1  0  1
2  2  0  2
3  1  1  0
4  5  1  1
5  6  1  2
6  4  2  0
7  9  2  1
8  5  2  2
like image 192
khammel Avatar answered Sep 30 '22 02:09

khammel


I don't know if it's most efficient, but numpy.meshgrid is designed for this:

x = np.array([[3, 1, 4],
              [1, 5, 9],
              [2, 6, 5]])
XX,YY = np.meshgrid(np.arange(x.shape[1]),np.arange(x.shape[0]))
table = np.vstack((x.ravel(),XX.ravel(),YY.ravel())).T
print(table)

This produces:

[[3 0 0]
 [1 1 0]
 [4 2 0]
 [1 0 1]
 [5 1 1]
 [9 2 1]
 [2 0 2]
 [6 1 2]
 [5 2 2]]

Then I think df = pandas.DataFrame(table) will give you your desired data frame.

like image 38
rjonnal Avatar answered Sep 30 '22 01:09

rjonnal