I have a huge dictionary something like this:
d[id1][id2] = value
example:
books["auth1"]["humor"] = 20
books["auth1"]["action"] = 30
books["auth2"]["comedy"] = 20
and so on..
Each of the "auth" keys can have any set of "genres" associated wtih them. The value for a keyed item is the number of books they wrote.
Now what I want is to convert it in a form of matrix...something like:
"humor" "action" "comedy"
"auth1" 20 30 0
"auth2" 0 0 20
How do i do this? Thanks
pandas do this very well:
books = {} books["auth1"] = {} books["auth2"] = {} books["auth1"]["humor"] = 20 books["auth1"]["action"] = 30 books["auth2"]["comedy"] = 20 from pandas import * df = DataFrame(books).T.fillna(0)
The output is:
action comedy humor auth1 30 0 20 auth2 0 20 0
Use a list comprehension to turn a dict into a list of lists and/or a numpy array:
np.array([[books[author][genre] for genre in sorted(books[author])] for author in sorted(books)])
EDIT
Apparently you have an irregular number of keys in each sub-dictionary. Make a list of all the genres:
genres = ['humor', 'action', 'comedy']
And then iterate over the dictionaries in the normal manner:
list_of_lists = [] for author_name, author in sorted(books.items()): titles = [] for genre in genres: try: titles.append(author[genre]) except KeyError: titles.append(0) list_of_lists.append(titles) books_array = numpy.array(list_of_lists)
Basically I'm attempting to append a value from each key in genres
to a list. If the key is not there, it throws an error. I catch the error, and append a 0 to the list instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With