Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

create pandas dataframe from dictionary of dictionaries

I have a dictionary of dictionaries of the form:

{'user':{movie:rating} } 

For example,

{Jill': {'Avenger: Age of Ultron': 7.0,                             'Django Unchained': 6.5,                             'Gone Girl': 9.0,                             'Kill the Messenger': 8.0} 'Toby': {'Avenger: Age of Ultron': 8.5,                                 'Django Unchained': 9.0,                                 'Zoolander': 2.0}} 

I want to convert this dict of dicts into a pandas dataframe with column 1 the user name and the other columns the movie ratings i.e.

user  Gone_Girl  Horrible_Bosses_2  Django_Unchained  Zoolander etc. \ 

However, some users did not rate the movies and so these movies are not included in the values() for that user key(). It would be nice in these cases to just fill the entry with NaN.

As of now, I iterate over the keys, fill a list, and then use this list to create a dataframe:

data=[]  for i,key in enumerate(movie_user_preferences.keys() ):     try:                     data.append((key                     ,movie_user_preferences[key]['Gone Girl']                     ,movie_user_preferences[key]['Horrible Bosses 2']                     ,movie_user_preferences[key]['Django Unchained']                     ,movie_user_preferences[key]['Zoolander']                     ,movie_user_preferences[key]['Avenger: Age of Ultron']                     ,movie_user_preferences[key]['Kill the Messenger']))     # if no entry, skip     except:         pass  df=pd.DataFrame(data=data,columns=['user','Gone_Girl','Horrible_Bosses_2','Django_Unchained','Zoolander','Avenger_Age_of_Ultron','Kill_the_Messenger']) 

But this only gives me a dataframe of users who rated all the movies in the set.

My goal is to append to the data list by iterating over the movie labels (rather than the brute force approach shown above) and, secondly, create a dataframe that includes all users and that places null values in the elements that do not have movie ratings.

like image 870
Feynman27 Avatar asked Oct 15 '15 20:10

Feynman27


People also ask

Can a dictionary of dictionaries be used to create a pandas DataFrame?

We can create a dataframe using Pandas. DataFrame() method. Example: Create pandas Dataframe from the dictionary of dictionaries.

How do I make a pandas DataFrame from a list of dictionaries?

Use pd. DataFrame. from_dict() to transform a list of dictionaries to pandas DatFrame. This function is used to construct DataFrame from dict of array-like or dicts.

How do you create a DataFrame from a dictionary?

Method 1: Create DataFrame from Dictionary using default Constructor of pandas. Dataframe class. Method 2: Create DataFrame from Dictionary with user-defined indexes. Method 3: Create DataFrame from simple dictionary i.e dictionary with key and simple value like integer or string value.

Can we create DataFrame from list of dictionaries?

We can directly pass the list of dictionaries to the Dataframe constructor. It will return a Dataframe i.e. As all the dictionaries in the list had similar keys, so the keys became the column names. Then for each key, values of that key in all the dictionaries became the column values.


1 Answers

You can pass the dict of dict to the DataFrame constructor:

In [11]: d = {'Jill': {'Django Unchained': 6.5, 'Gone Girl': 9.0, 'Kill the Messenger': 8.0, 'Avenger: Age of Ultron': 7.0}, 'Toby': {'Django Unchained': 9.0, 'Zoolander': 2.0, 'Avenger: Age of Ultron': 8.5}}  In [12]: pd.DataFrame(d) Out[12]:                         Jill  Toby Avenger: Age of Ultron   7.0   8.5 Django Unchained         6.5   9.0 Gone Girl                9.0   NaN Kill the Messenger       8.0   NaN Zoolander                NaN   2.0 

Or use the from_dict method:

In [13]: pd.DataFrame.from_dict(d) Out[13]:                         Jill  Toby Avenger: Age of Ultron   7.0   8.5 Django Unchained         6.5   9.0 Gone Girl                9.0   NaN Kill the Messenger       8.0   NaN Zoolander                NaN   2.0  In [14]: pd.DataFrame.from_dict(d, orient='index') Out[14]:       Django Unchained  Gone Girl  Kill the Messenger  Avenger: Age of Ultron  Zoolander Jill               6.5          9                   8                     7.0        NaN Toby               9.0        NaN                 NaN                     8.5          2 
like image 141
Andy Hayden Avatar answered Oct 11 '22 00:10

Andy Hayden