I have data in a file in following form: <pre class="prettyprint"><code>user_id, item_id, rating 1, abc,5 1, abcd,3 2, abc, 3 2, fgh, 5 </code></pre> So, the matrix I want to form for above data is following: <pre class="prettyprint"><code># itemd_ids # abc abcd fgh [[5, 3, 0] # user_id 1 [3, 0, 5]] # user_id 2 </code></pre> where missing data is replaced by 0. But from this I want to create both user to user similarity matrix and item to item similarity matrix? How do I do that?

Technically, this is not a programming problem but a math problem. But I think you better off using variance-covariance matrix. Or correlation matrix, if the scale of the values are very different, say, instead of having: <pre class="prettyprint"><code>>>> x array([[5, 3, 0], [3, 0, 5], [5, 5, 0], [1, 1, 7]]) </code></pre> You have: <pre class="prettyprint"><code>>>> x array([[5, 300, 0], [3, 0, 5], [5, 500, 0], [1, 100, 7]]) </code></pre> To get a variance-cov matrix: <pre class="prettyprint"><code>>>> np.cov(x) array([[ 6.33333333, -3.16666667, 6.66666667, -8. ], [ -3.16666667, 6.33333333, -5.83333333, 7. ], [ 6.66666667, -5.83333333, 8.33333333, -10. ], [ -8. , 7. , -10. , 12. ]]) </code></pre> Or the correlation matrix: <pre class="prettyprint"><code>>>> np.corrcoef(x) array([[ 1. , -0.5 , 0.91766294, -0.91766294], [-0.5 , 1. , -0.80295507, 0.80295507], [ 0.91766294, -0.80295507, 1. , -1. ], [-0.91766294, 0.80295507, -1. , 1. ]]) </code></pre> This is the way to look at it, the diagonal cell, i.e., <code>(0,0)</code> cell, is the correlation of your 1st vector in X to it self, so it is 1. The other cells, i.e, <code>(0,1)</code> cell, is the correlation between the 1st and 2nd vector in X. They are negatively correlated. Or similarly, the 1st and 3rd cell are positively correlated. covariance matrix or correlation matrix avoid the zero problem pointed out by @Akavall.

See this question: What's the fastest way in Python to calculate cosine similarity given sparse matrix data? Having: <pre class="prettyprint"><code>A = np.array( [[0, 1, 0, 0, 1], [0, 0, 1, 1, 1], [1, 1, 0, 1, 0]]) dist_out = 1-pairwise_distances(A, metric="cosine") dist_out </code></pre> Result in: <pre class="prettyprint"><code>array([[ 1. , 0.40824829, 0.40824829], [ 0.40824829, 1. , 0.33333333], [ 0.40824829, 0.33333333, 1. ]]) </code></pre> But that works for dense matrix. For sparse you have to develop your solution.

how to create similarity matrix in numpy python?

I have data in a file in following form:

user_id, item_id, rating
1, abc,5
1, abcd,3
2, abc, 3
2, fgh, 5

So, the matrix I want to form for above data is following:

#   itemd_ids
# abc  abcd  fgh
[[5,    3,    0]  # user_id 1
 [3,    0,    5]] # user_id 2

where missing data is replaced by 0.

But from this I want to create both user to user similarity matrix and item to item similarity matrix?

How do I do that?

How to compute cosine similarity matrix of two NumPy arrays?

How to compute cosine similarity matrix of two numpy array? We will create a function to implement it. Here is an example: def cos_sim_2d(x, y): norm_x = x / np.linalg.norm(x, axis=1, keepdims=True) norm_y = y / np.linalg.norm(y, axis=1, keepdims=True) return np.matmul(norm_x, norm_y.T)

How to create a matrix in NumPy?

Matrix is a two-dimensional array. In numpy, you can create two-dimensional arrays using the array () method with the two or more arrays separated by the comma. You can read more about matrix in details on Matrix Mathematics. How to create a matrix in a Numpy?

How to create two-dimensional arrays in NumPy?

In numpy, you can create two-dimensional arrays using the array() method with the two or more arrays separated by the comma. You can read more about matrix in details on Matrix Mathematics.

How to create and initialize a matrix in Python?

To create and initialize a matrix in python, there are several solutions, some commons examples using the python module numpy: it is then useful to add an axis to the matrix A using np.newaxis ( ref ): To create a matrix containing only 0, a solution is to use the numpy function zeros

Technically, this is not a programming problem but a math problem. But I think you better off using variance-covariance matrix. Or correlation matrix, if the scale of the values are very different, say, instead of having:

>>> x
array([[5, 3, 0],
       [3, 0, 5],
       [5, 5, 0],
       [1, 1, 7]])

You have:

>>> x
array([[5, 300, 0],
       [3, 0, 5],
       [5, 500, 0],
       [1, 100, 7]])

To get a variance-cov matrix:

>>> np.cov(x)
array([[  6.33333333,  -3.16666667,   6.66666667,  -8.        ],
       [ -3.16666667,   6.33333333,  -5.83333333,   7.        ],
       [  6.66666667,  -5.83333333,   8.33333333, -10.        ],
       [ -8.        ,   7.        , -10.        ,  12.        ]])

Or the correlation matrix:

>>> np.corrcoef(x)
array([[ 1.        , -0.5       ,  0.91766294, -0.91766294],
       [-0.5       ,  1.        , -0.80295507,  0.80295507],
       [ 0.91766294, -0.80295507,  1.        , -1.        ],
       [-0.91766294,  0.80295507, -1.        ,  1.        ]])

This is the way to look at it, the diagonal cell, i.e., (0,0) cell, is the correlation of your 1st vector in X to it self, so it is 1. The other cells, i.e, (0,1) cell, is the correlation between the 1st and 2nd vector in X. They are negatively correlated. Or similarly, the 1st and 3rd cell are positively correlated.

covariance matrix or correlation matrix avoid the zero problem pointed out by @Akavall.

See this question: What's the fastest way in Python to calculate cosine similarity given sparse matrix data?

Having:

A = np.array(
[[0, 1, 0, 0, 1],
[0, 0, 1, 1, 1],
[1, 1, 0, 1, 0]])

dist_out = 1-pairwise_distances(A, metric="cosine")
dist_out

Result in:

array([[ 1.        ,  0.40824829,  0.40824829],
       [ 0.40824829,  1.        ,  0.33333333],
       [ 0.40824829,  0.33333333,  1.        ]])

But that works for dense matrix. For sparse you have to develop your solution.

how to create similarity matrix in numpy python?

Tags:

python

machine-learning

matrix

numpy

frazman

People also ask

2 Answers

CT Zhu

Medeiros

Recent Activity

Donate For Us

how to create similarity matrix in numpy python?

Tags:

python

machine-learning

matrix

numpy

frazman

People also ask

2 Answers

CT Zhu

Medeiros

Related questions

Recent Activity

Donate For Us