Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

create matrix structure using pandas

I have loaded the below CSV file containing code and coefficient data into the below dataframe df:

CODE|COEFFICIENT  
A|0.5  
B|0.4  
C|0.3

import pandas as pd
import numpy as np
df= pd.read_csv('cod_coeff.csv', delimiter='|', encoding="utf-8-sig")

giving

  ITEM   COEFFICIENT  
0    A       0.5  
1    B       0.4  
2    C       0.3  

From the above dataframe, I need to create a final dataframe as below which has a matrix structure with the product of the coefficients:

     A         B         C        
A   0.25      0.2        0.15  
B   0.2       0.16       0.12  
C   0.15      0.12       0.09

I am using np.multiply but I am not successful in producing the result.

like image 382
dataviz Avatar asked Aug 31 '16 03:08

dataviz


People also ask

How to create correlation matrix using PANDAS?

To create correlation matrix using pandas, these steps should be taken: 1 Obtain the data. 2 Create the DataFrame using Pandas. 3 Create correaltion matrix using Pandas Example 1: import pandas as pd data = {'A': [45, 37, 42], 'B': [38, 31, 26], 'C': [10, 15, 17] } df = pd.

How to create a scatter matrix from a pandas Dataframe?

You can use the scatter_matrix () function to create a scatter matrix from a pandas DataFrame: The following examples show how to use this syntax in practice with the following pandas DataFrame:

How to create a Dataframe in pandas?

In the real world, a Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionary etc. Dataframe can be created in different ways here are some ways by which we create a dataframe:

What is pandas data structure in Python?

It offers a tool for cleaning and processes your data. It is the most popular Python library that is used for data analysis. In this article, We are going to learn about Pandas Data structure. It supports two data structures:


2 Answers

numpy as a faster alternative

pd.DataFrame(np.outer(df, df), df.index, df.index)

enter image description here


Timing

Given sample

enter image description here

30,000 rows

df = pd.concat([df for _ in range(10000)], ignore_index=True)

enter image description here

like image 56
piRSquared Avatar answered Oct 09 '22 11:10

piRSquared


You want to do the math between a vector and its tranposition. Transpose with .T and apply the matrix dot function between the two dataframes.

df = df.set_index('CODE')

df.T
Out[10]: 
CODE             A    B    C
COEFFICIENT    0.5  0.4  0.3

df.dot(df.T)
Out[11]: 
CODE     A     B     C
CODE                  
A     0.25  0.20  0.15
B     0.20  0.16  0.12
C     0.15  0.12  0.09
like image 6
Zeugma Avatar answered Oct 09 '22 10:10

Zeugma