Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting a dataframe to dictionary with multiple values

I have a dataframe like

Sr.No   ID       A         B          C         D
 1     Tom     Earth    English      BMW
 2     Tom     Mars     Spanish      BMW       Green          
 3     Michael Mercury  Hindi        Audi      Yellow
 4     John    Venus    Portugese    Mercedes  Blue
 5     John             German       Audi      Red

I am trying to convert this to a dictionary by ID like :

{'ID' : 'Tom', 'A' : ['Earth', 'Mars'], 'B' : ['English', 'Spanish'], 'C' : 
                                                ['BMW', 'BMW'], 'D':['Green'] }, 

{'ID' : 'Michael', 'A' : ['Mercury'], 'B' : ['Hindi'], 'C' : ['Audi'],
                                                               'D':['Yellow']},

{'ID' : 'John', 'A' : ['Venus'], 'B' : ['Portugese', 'German'], 'C' : 
                                     ['Mercedes', 'Audi'], 'D':['Blue', 'Red'] }

This is somewhat similar to what I want.

I also tried ,

df.set_index('ID').to_dict()

but this gives me dictionary of length 5 instead of 3. Any help would be appreciated.

like image 234
Ronak Shah Avatar asked Aug 22 '16 11:08

Ronak Shah


People also ask

How do you create a dictionary using multiple values?

In python, if we want a dictionary in which one key has multiple values, then we need to associate an object with each key as value. This value object should be capable of having various values inside it. We can either use a tuple or a list as a value in the dictionary to associate multiple values with a key.

Can a dictionary hold multiple values?

General Idea: In Python, if we want a dictionary to have multiple values for a single key, we need to store these values in their own container within the dictionary. To do so, we need to use a container as a value and add our multiple values to that container. Common containers are lists, tuples, and sets.

How do you convert a DataFrame to a dictionary?

To convert pandas DataFrame to Dictionary object, use to_dict() method, this takes orient as dict by default which returns the DataFrame in format {column -> {index -> value}} . When no orient is specified, to_dict() returns in this format.

How do I convert a two column DataFrame to a dictionary in pandas?

To create a dictionary from two column values, we first create a Pandas series with the column for keys as index and the other column as values. And then we can apply Pandas' to_dict() function to get dictionary.


2 Answers

Grouping by 'ID' and apply to_dict to each group with orient='list' comes pretty close:

df.groupby('ID').apply(lambda dfg: dfg.to_dict(orient='list')).to_dict()
Out[25]: 
{'John': {'A': ['Venus', nan],
  'B': ['Portugese', 'German'],
  'C': ['Mercedes', 'Audi'],
  'D': ['Blue', 'Red'],
  'ID': ['John', 'John'],
  'Sr.No': [4, 5]},
 'Michael': {'A': ['Mercury'],
  'B': ['Hindi'],
  'C': ['Audi'],
  'D': ['Yellow'],
  'ID': ['Michael'],
  'Sr.No': [3]},
 'Tom': {'A': ['Earth', 'Mars'],
  'B': ['English', 'Spanish'],
  'C': ['BMW', 'BMW'],
  'D': [nan, 'Green'],
  'ID': ['Tom', 'Tom'],
  'Sr.No': [1, 2]}}

It should just be a matter of formatting the result slightly.

Edit: to remove 'ID' from the dictionaries:

df.groupby('ID').apply(lambda dfg: dfg.drop('ID', axis=1).to_dict(orient='list')).to_dict()
Out[5]: 
{'John': {'A': ['Venus', nan],
  'B': ['Portugese', 'German'],
  'C': ['Mercedes', 'Audi'],
  'D': ['Blue', 'Red'],
  'Sr.No': [4, 5]},
 'Michael': {'A': ['Mercury'],
  'B': ['Hindi'],
  'C': ['Audi'],
  'D': ['Yellow'],
  'Sr.No': [3]},
 'Tom': {'A': ['Earth', 'Mars'],
  'B': ['English', 'Spanish'],
  'C': ['BMW', 'BMW'],
  'D': [nan, 'Green'],
  'Sr.No': [1, 2]}}
like image 179
IanS Avatar answered Sep 21 '22 10:09

IanS


You can use groupby with orient of to_dict as list and convert the resultant series to a dictionary.

df.set_index('Sr.No', inplace=True)
df.groupby('ID').apply(lambda x: x.to_dict('list')).reset_index(drop=True).to_dict()

{0: {'C': ['Mercedes', 'Audi'], 'ID': ['John', 'John'], 'A': ['Venus', nan],  
     'B': ['Portugese', 'German'], 'D': ['Blue', 'Red']}, 
 1: {'C': ['Audi'], 'ID': ['Michael'], 'A': ['Mercury'], 'B': ['Hindi'], 'D': ['Yellow']}, 
 2: {'C': ['BMW', 'BMW'], 'ID': ['Tom', 'Tom'], 'A': ['Earth', 'Mars'], 
     'B': ['English', 'Spanish'], 'D': [nan, 'Green']}}

Inorder to remove ID, you can also do:

df.groupby('ID')['A','B','C','D'].apply(lambda x: x.to_dict('list'))  \
                                 .reset_index(drop=True).to_dict()
like image 24
Nickil Maveli Avatar answered Sep 20 '22 10:09

Nickil Maveli