I have a dict of lists e.g.,
dictionary_test = {'A': ['hello', 'byebye', 'howdy'], 'B': ['bonjour', 'hello', 'ciao'], 'C': ['ciao', 'hello', 'byebye']}
I want to convert it into a boolean affiliation matrix for further analysis. Preferably, dict keys
as column names, and list items as row names:
A B C
hello 1 1 1
byebye 1 0 1
howdy 1 0 0
bonjour 0 1 0
ciao 0 1 1
Is it possible to do in Python (preferably so that I could write the matrix to a .csv
file)?
I would image this is something I would have to do with numpy
, correct?
An additional problem is that the size of the dictionary is unknown (both the number of keys and the number of elements in lists vary).
You can use pandas
. Here is an example.
>>> import pandas as pd
>>> dictionary_test = {'A': ['hello', 'byebye', 'howdy'], 'B': ['bonjour', 'hello', 'ciao'], 'C': ['ciao', 'hello', 'byebye']}
>>> values = list(set([ x for y in dictionary_test.values() for x in y]))
>>> data = {}
>>> for key in dictionary_test.keys():
... data[key] = [ True if value in dictionary_test[key] else False for value in values ]
...
>>> pd.DataFrame(data, index=values)
A B C
ciao False True True
howdy True False False
bonjour False True False
hello True True True
byebye True False True
If you want the rows in certain order. Just manually set values
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With