I have a dictionary like this:
{device1 : (news1, news2, ...), device2 : (news 2, news 4, ...)...}
How to convert them into a 2-D 0-1 matrix in python? Looks like this:
news1 news2 news3 news4
device1 1 1 0 0
device2 0 1 0 1
device3 1 0 0 1
Here is some code that will create a matrix (or 2D array) using the numpy package. Note that we have to use a list of the names in order because dictionaries do not necessarily store the keys/values in the order they are entered.
import numpy as np
dataDict = {'device1':(1,1,0,1), 'device2':(0,1,0,1), 'device3':(1,0,0,1)}
orderedNames = ['device1','device2','device3']
dataMatrix = np.array([dataDict[i] for i in orderedNames])
print dataMatrix
The output is:
[[1 1 0 1]
[0 1 0 1]
[1 0 0 1]]
Here is another choice to convert a dictionary to a matrix:
# Load library
from sklearn.feature_extraction import DictVectorizer
# Our dictionary of data
data_dict = [{'Red': 2, 'Blue': 4},
{'Red': 4, 'Blue': 3},
{'Red': 1, 'Yellow': 2},
{'Red': 2, 'Yellow': 2}]
# Create DictVectorizer object
dictvectorizer = DictVectorizer(sparse=False)
# Convert dictionary into feature matrix
features = dictvectorizer.fit_transform(data_dict)
print(features)
#output
'''
[[4. 2. 0.]
[3. 4. 0.]
[0. 1. 2.]
[0. 2. 2.]]
'''
print(dictvectorizer.get_feature_names())
#output
'''
['Blue', 'Red', 'Yellow']
'''
Adding on to this since I think previous answers assume you have your data structured differently and don't directly address your issue.
Assuming I'm understanding your data structure correctly and the names of the indices in your matrix don't really matter:
from sklearn.feature_extraction import DictVectorizer
dict = {'device1':['news1', 'news2'],
'device2':['news2', 'news4'],
'device3':['news1', 'news4']}
restructured = []
for key in dict:
data_dict = {}
for news in dict[key]:
data_dict[news] = 1
data_dict['news3'] = 0
restructured.append(data_dict)
#restructured should now look like
'''
[{'news1':1, 'news2':1, 'news3':0},
{'news2':1, 'news4':1, 'news3':0},
{'news1':1, 'news4':1, 'news3':0}]
'''
dictvectorizer = DictVectorizer(sparse=False)
features = dictvectorizer.fit_transform(restructured)
print(features)
#output
'''
[[1, 1, 0, 0],
[0, 1, 1, 0],
[1, 0, 1, 0]]
'''
print(dictvectorizer.get_feature_names())
#output
'''
['news1', 'news2', 'news4', 'news3']
'''
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With