Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to melt a dataframe while doing some operation?

Tags:

python

pandas

Let's say that I have the following dataframe:

index    K1    K2    D1    D2    D3
N1       0     1     12    4     6
N2       1     1     10    2     7
N3       0     0     3     5     8

Basically, I want to transform this dataframe into the following:

index    COL1   COL2
K1       D1     = 0*12+1*10+0*3
K1       D2     = 0*4+1*2+0*5
K1       D3     = 0*6+1*7+0*8
K2       D1     = 1*12+1*10+0*3
K2       D2     = 1*4+1*2+0*5
K2       D3     = 1*6+1*7+0*8

The content of COL2 is basically the dot product (aka the scalar product) between the vector in index and the one in COL1. For example, let's take the first line of the resulting df. Under index, we have K1 and, under COL1 we have D1. Looking at the first table, we know that K1 = [0,1,0] and D1 = [12,10,3]. The scalar product of these two "vectors" is the value inside COL2 (first line).

I'm trying to find a way of doing this without using a nested loop (because the idea is to make something efficient), however, I don't exactly know how. I tried using the pd.melt() function and, although it gets me closer to what I want, it doesn't exactly get me to where I want. Could you give me a hint?

like image 828
Skyris Avatar asked Oct 14 '20 21:10

Skyris


2 Answers

This is matrix multiplication:

(df[['D1','D2','D3']].T@df[['K1','K2']]).unstack().reset_index()

Output:

  level_0 level_1   0
0      K1      D1  10
1      K1      D2   2
2      K1      D3   7
3      K2      D1  22
4      K2      D2   6
5      K2      D3  13
like image 171
Quang Hoang Avatar answered Sep 21 '22 08:09

Quang Hoang


Building off @QuangHoang's answer, you can see if the numpy .dot matrix multiplication version is a little bit more performant. Since .to_numpy() just creates a matrix of numbers that is not tied to K and D indices, you have to do some manipulation to get it back in the pandas format you want besides purely numbers:

a1 = df[['D1','D2','D3']].T.to_numpy()
a2 = df[['K1','K2']].to_numpy()
df1 = pd.DataFrame(a1.dot(a2)).unstack().reset_index() #see other options below
df1['level_0'] = 'K' + (df1['level_0'] + 1).astype(str)
df1['level_1'] = 'D' + (df1['level_1'] + 1).astype(str)
df1
Out[1]: 
  level_0 level_1   0
0      K1      D1  10
1      K1      D2   2
2      K1      D3   7
3      K2      D1  22
4      K2      D2   6
5      K2      D3  13

Other numpy array options:

df1 = pd.DataFrame(a1 @ a2).unstack().reset_index()
df1 = pd.DataFrame(np.matmul(a1, a2)).unstack().reset_index()

More information (numpy matrix vector multiplication)

like image 23
David Erickson Avatar answered Sep 21 '22 08:09

David Erickson