I have a dataframe like
Sr.No ID A B C D
1 Tom Earth English BMW
2 Tom Mars Spanish BMW Green
3 Michael Mercury Hindi Audi Yellow
4 John Venus Portugese Mercedes Blue
5 John German Audi Red
I am trying to convert this to a dictionary by ID like :
{'ID' : 'Tom', 'A' : ['Earth', 'Mars'], 'B' : ['English', 'Spanish'], 'C' :
['BMW', 'BMW'], 'D':['Green'] },
{'ID' : 'Michael', 'A' : ['Mercury'], 'B' : ['Hindi'], 'C' : ['Audi'],
'D':['Yellow']},
{'ID' : 'John', 'A' : ['Venus'], 'B' : ['Portugese', 'German'], 'C' :
['Mercedes', 'Audi'], 'D':['Blue', 'Red'] }
This is somewhat similar to what I want.
I also tried ,
df.set_index('ID').to_dict()
but this gives me dictionary of length 5 instead of 3. Any help would be appreciated.
In python, if we want a dictionary in which one key has multiple values, then we need to associate an object with each key as value. This value object should be capable of having various values inside it. We can either use a tuple or a list as a value in the dictionary to associate multiple values with a key.
General Idea: In Python, if we want a dictionary to have multiple values for a single key, we need to store these values in their own container within the dictionary. To do so, we need to use a container as a value and add our multiple values to that container. Common containers are lists, tuples, and sets.
To convert pandas DataFrame to Dictionary object, use to_dict() method, this takes orient as dict by default which returns the DataFrame in format {column -> {index -> value}} . When no orient is specified, to_dict() returns in this format.
To create a dictionary from two column values, we first create a Pandas series with the column for keys as index and the other column as values. And then we can apply Pandas' to_dict() function to get dictionary.
Grouping by 'ID'
and apply to_dict
to each group with orient='list'
comes pretty close:
df.groupby('ID').apply(lambda dfg: dfg.to_dict(orient='list')).to_dict()
Out[25]:
{'John': {'A': ['Venus', nan],
'B': ['Portugese', 'German'],
'C': ['Mercedes', 'Audi'],
'D': ['Blue', 'Red'],
'ID': ['John', 'John'],
'Sr.No': [4, 5]},
'Michael': {'A': ['Mercury'],
'B': ['Hindi'],
'C': ['Audi'],
'D': ['Yellow'],
'ID': ['Michael'],
'Sr.No': [3]},
'Tom': {'A': ['Earth', 'Mars'],
'B': ['English', 'Spanish'],
'C': ['BMW', 'BMW'],
'D': [nan, 'Green'],
'ID': ['Tom', 'Tom'],
'Sr.No': [1, 2]}}
It should just be a matter of formatting the result slightly.
Edit: to remove 'ID'
from the dictionaries:
df.groupby('ID').apply(lambda dfg: dfg.drop('ID', axis=1).to_dict(orient='list')).to_dict()
Out[5]:
{'John': {'A': ['Venus', nan],
'B': ['Portugese', 'German'],
'C': ['Mercedes', 'Audi'],
'D': ['Blue', 'Red'],
'Sr.No': [4, 5]},
'Michael': {'A': ['Mercury'],
'B': ['Hindi'],
'C': ['Audi'],
'D': ['Yellow'],
'Sr.No': [3]},
'Tom': {'A': ['Earth', 'Mars'],
'B': ['English', 'Spanish'],
'C': ['BMW', 'BMW'],
'D': [nan, 'Green'],
'Sr.No': [1, 2]}}
You can use groupby
with orient of to_dict
as list
and convert the resultant series to a dictionary
.
df.set_index('Sr.No', inplace=True)
df.groupby('ID').apply(lambda x: x.to_dict('list')).reset_index(drop=True).to_dict()
{0: {'C': ['Mercedes', 'Audi'], 'ID': ['John', 'John'], 'A': ['Venus', nan],
'B': ['Portugese', 'German'], 'D': ['Blue', 'Red']},
1: {'C': ['Audi'], 'ID': ['Michael'], 'A': ['Mercury'], 'B': ['Hindi'], 'D': ['Yellow']},
2: {'C': ['BMW', 'BMW'], 'ID': ['Tom', 'Tom'], 'A': ['Earth', 'Mars'],
'B': ['English', 'Spanish'], 'D': [nan, 'Green']}}
Inorder to remove ID
, you can also do:
df.groupby('ID')['A','B','C','D'].apply(lambda x: x.to_dict('list')) \
.reset_index(drop=True).to_dict()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With