Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using NumPy to convert user/item ratings into 2-D array

With performing some classificion using some user/item/rating data. My issue is how to I convert these 3 columns into a matrix of user(row), item(columns) and the ratings data populating the matrix.

User  Item  ItemRating
1     23    3
2     204   4
1     492   2
3     23    4

and so on. I tried using DataFrame but was getting NULL errors.

like image 970
user2822055 Avatar asked Nov 20 '13 18:11

user2822055


People also ask

How do you make a 2D NumPy array?

To create a NumPy array, you can use the function np. array() . All you need to do to create a simple array is pass a list to it. If you choose to, you can also specify the type of data in your list.

How is NumPy used in data analysis?

NumPy (short for Numerical Python) provides an efficient interface to store and operate on dense data buffers. In some ways, NumPy arrays are like Python's built-in list type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size.

Which function creates a 2D array with all values 1?

If I have to create a 2D array of 1s or 0s, I can use numpy. ones() or numpy. zeros() respectively.


1 Answers

This is pivot, if I get your idea right, with pandas it will be as follows.

Load data:

import pandas as pd
df = pd.read_csv(fname, sep='\s+', header=None)
df.columns = ['User','Item','ItemRating']

Pivot it:

>>> df
   User  Item  ItemRating
0     1    23           3
1     2   204           4
2     1   492           2
3     3    23           4
>>> df.pivot(index='User', columns='Item', values='ItemRating')
Item  23   204  492
User
1       3  NaN    2
2     NaN    4  NaN
3       4  NaN  NaN

For a numpy example, let's emulate file with StringIO:

from StringIO import StringIO
data ="""1     23    3
2     204   4
1     492   2
3     23    4"""

and load it:

>>> arr = np.genfromtxt(StringIO(data), dtype=int)
>>> arr
array([[  1,  23,  3],
       [  2, 204,  4],
       [  1, 492,  2],
       [  3,  23,  4]])

pivot is based on this answer

rows, row_pos = np.unique(arr[:, 0], return_inverse=True)
cols, col_pos = np.unique(arr[:, 1], return_inverse=True)
rows, row_pos = np.unique(arr[:, 0], return_inverse=True)
cols, col_pos = np.unique(arr[:, 1], return_inverse=True)
pivot_table = np.zeros((len(rows), len(cols)), dtype=arr.dtype)
pivot_table[row_pos, col_pos] = arr[:, 2]

and the result:

>>> pivot_table
array([[ 3,  0,  2],
       [ 0,  4,  0],
       [ 4,  0,  0]])

Note that results differ, as in second approach non-existing values are set to zero.

Select one that suits you better ;)

like image 195
alko Avatar answered Nov 14 '22 21:11

alko