Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert dictionary to matrix in python?

Tags:

python

I have a dictionary like this:

{device1 : (news1, news2, ...), device2 : (news 2, news 4, ...)...}

How to convert them into a 2-D 0-1 matrix in python? Looks like this:

         news1 news2 news3 news4
device1    1     1     0      0
device2    0     1     0      1
device3    1     0     0      1
like image 824
Yue Deng Avatar asked May 23 '17 02:05

Yue Deng


Video Answer


3 Answers

Here is some code that will create a matrix (or 2D array) using the numpy package. Note that we have to use a list of the names in order because dictionaries do not necessarily store the keys/values in the order they are entered.

import numpy as np

dataDict = {'device1':(1,1,0,1), 'device2':(0,1,0,1), 'device3':(1,0,0,1)}
orderedNames = ['device1','device2','device3']

dataMatrix = np.array([dataDict[i] for i in orderedNames])

print dataMatrix

The output is:

[[1 1 0 1]
 [0 1 0 1]
 [1 0 0 1]]
like image 147
Robbie Avatar answered Oct 18 '22 03:10

Robbie


Here is another choice to convert a dictionary to a matrix:

# Load library
from sklearn.feature_extraction import DictVectorizer

# Our dictionary of data
data_dict = [{'Red': 2, 'Blue': 4},
             {'Red': 4, 'Blue': 3},
             {'Red': 1, 'Yellow': 2},
             {'Red': 2, 'Yellow': 2}]
# Create DictVectorizer object
dictvectorizer = DictVectorizer(sparse=False)

# Convert dictionary into feature matrix
features = dictvectorizer.fit_transform(data_dict)
print(features)
#output
'''
[[4. 2. 0.]
 [3. 4. 0.]
 [0. 1. 2.]
 [0. 2. 2.]]
'''
print(dictvectorizer.get_feature_names())
#output
'''
['Blue', 'Red', 'Yellow']
'''
like image 35
tolgabuyuktanir Avatar answered Oct 18 '22 02:10

tolgabuyuktanir


Adding on to this since I think previous answers assume you have your data structured differently and don't directly address your issue.

Assuming I'm understanding your data structure correctly and the names of the indices in your matrix don't really matter:

from sklearn.feature_extraction import DictVectorizer

dict = {'device1':['news1', 'news2'],
        'device2':['news2', 'news4'],
        'device3':['news1', 'news4']}

restructured = []

for key in dict:
    data_dict = {}
    for news in dict[key]:
        data_dict[news] = 1
    data_dict['news3'] = 0
    restructured.append(data_dict)

#restructured should now look like
'''
[{'news1':1, 'news2':1, 'news3':0},
 {'news2':1, 'news4':1, 'news3':0},
 {'news1':1, 'news4':1, 'news3':0}]
'''

dictvectorizer = DictVectorizer(sparse=False)
features = dictvectorizer.fit_transform(restructured)

print(features)

#output
'''
[[1, 1, 0, 0],
 [0, 1, 1, 0],
 [1, 0, 1, 0]]
'''
print(dictvectorizer.get_feature_names())
#output
'''
['news1', 'news2', 'news4', 'news3']
'''
like image 2
mgrogger Avatar answered Oct 18 '22 01:10

mgrogger