So I have a DataFrame
, I labeled the columns a - i. I want to make a Dictionary of Dictionaries
where the outer key is column "a", the inner key is column "d", and the value is "e". I know how to do this by iterating through each row, but I feel like there is a more efficient way to do this using DataFrame.to_dict()
but I can't figure out how...maybe DataFrame.group_by
could help but that seems to be used for grouping column or index IDs.
How can I use pandas
(or numpy
) to create a Dictionary of Dictionaries
efficiently without iterating through each row? I've shown an example of my current method and what the desired output should be below.
#!/usr/bin/python
import numpy as np
import pandas as pd
tmp_array = np.array([['AAA', 86880690, 86914111, '22RV1', 2, 2, 'H', '-'], ['ABA', 86880690, 86914111, 'A549', 2, 2, 'L', '-'], ['AAC', 86880690, 86914111, 'BFTC-905', 3, 3, 'H', '-'], ['AAB', 86880690, 86914111, 'BT-20', 2, 2, 'H', '-'], ['AAA', 86880690, 86914111, 'C32', 2, 2, 'H', '-']])
DF = pd.DataFrame(tmp_array,columns=["a,b,c,d,e,g,h,i".split(",")])
#print(DF)
a b c d e g h i
0 AAA 86880690 86914111 22RV1 2 2 H -
1 ABA 86880690 86914111 A549 2 2 L -
2 AAC 86880690 86914111 BFTC-905 3 3 H -
3 AAB 86880690 86914111 BT-20 2 2 H -
4 AAA 86880690 86914111 C32 2 2 H -
from collections import defaultdict
from itertools import izip
D_a_d_e = defaultdict(dict)
for a,d,e in izip(DF["a"],DF["d"],DF["e"]):
D_a_d_e[a][d] = e
#print(D_a_d_e)
#ignore the defaultdict part
defaultdict(<type 'dict'>, {'ABA': {'A549': '2'}, 'AAA': {'22RV1': '2', 'C32': '2'}, 'AAC': {'BFTC-905': '3'}, 'AAB': {'BT-20': '2'}})
I saw this https://stackoverflow.com/questions/28820254/how-to-create-a-pandas-dataframe-using-a-dictionary-in-a-single-column but it was a little different and it also doesn't have an answer.
To create a dictionary from two column values, we first create a Pandas series with the column for keys as index and the other column as values. And then we can apply Pandas' to_dict() function to get dictionary.
To convert pandas DataFrame to Dictionary object, use to_dict() method, this takes orient as dict by default which returns the DataFrame in format {column -> {index -> value}} . When no orient is specified, to_dict() returns in this format.
Series method. To make a series from a dictionary, simply pass the dictionary to the command pandas. Series method. The keys of the dictionary form the index values of the series and the values of the dictionary form the values of the series.
to_dict() method is used to convert a dataframe into a dictionary of series or list like data type depending on orient parameter. Parameters: orient: String value, ('dict', 'list', 'series', 'split', 'records', 'index') Defines which dtype to convert Columns(series into).
There's a to_dict
method:
In [11]: DF.to_dict()
Out[11]:
{'a': {0: 'AAA', 1: 'ABA', 2: 'AAC', 3: 'AAB', 4: 'AAA'},
'b': {0: '86880690', 1: '86880690', 2: '86880690' 3: '86880690', 4: '86880690'},
'c': {0: '86914111', 1: '86914111', 2: '86914111', 3: '86914111', 4: '86914111'},
'd': {0: '22RV1', 1: 'A549', 2: 'BFTC-905', 3: 'BT-20', 4: 'C32'},
'e': {0: '2', 1: '2', 2: '3', 3: '2', 4: '2'},
'g': {0: '2', 1: '2', 2: '3', 3: '2', 4: '2'},
'h': {0: 'H', 1: 'L', 2: 'H', 3: 'H', 4: 'H'},
'i': {0: '-', 1: '-', 2: '-', 3: '-', 4: '-'}}
In [12]: DF.to_dict(orient="index")
Out[12]:
{0: {'a': 'AAA', 'b': '86880690', 'c': '86914111', 'd': '22RV1', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'},
1: {'a': 'ABA', 'b': '86880690', 'c': '86914111', 'd': 'A549', 'e': '2', 'g': '2', 'h': 'L', 'i': '-'},
2: {'a': 'AAC', 'b': '86880690', 'c': '86914111', 'd': 'BFTC-905', 'e': '3', 'g': '3', 'h': 'H', 'i': '-'},
3: {'a': 'AAB', 'b': '86880690', 'c': '86914111', 'd': 'BT-20', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'},
4: {'a': 'AAA', 'b': '86880690', 'c': '86914111', 'd': 'C32', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'}}
With that in mind you can do the groupby:
In [21]: DF.set_index("d").groupby("a")[["e"]].apply(lambda x: x["e"].to_dict())
Out[21]:
a
AAA {'C32': '2', '22RV1': '2'}
AAB {'BT-20': '2'}
AAC {'BFTC-905': '3'}
ABA {'A549': '2'}
dtype: object
That said, you may be able to use a straight up MultiIndex instead of a dictionary of dictionaries:
In [31]: res = DF.set_index(["a", "d"])["e"]
In [32]: res
Out[32]:
a d
AAA 22RV1 2
ABA A549 2
AAC BFTC-905 3
AAB BT-20 2
AAA C32 2
Name: e, dtype: object
It'll work much the same way:
In [33]: res["AAA"]
Out[33]:
d
22RV1 2
C32 2
Name: e, dtype: object
In [34]: res["AAA"]["22RV1"]
Out[34]: '2'
But will be a more space-efficient / you're still in pandas.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With