How to use Pandas to create Dictionary from column entries in DataFrame or np.array

Tags:

So I have a DataFrame, I labeled the columns a - i. I want to make a Dictionary of Dictionaries where the outer key is column "a", the inner key is column "d", and the value is "e". I know how to do this by iterating through each row, but I feel like there is a more efficient way to do this using DataFrame.to_dict() but I can't figure out how...maybe DataFrame.group_by could help but that seems to be used for grouping column or index IDs.

How can I use pandas (or numpy) to create a Dictionary of Dictionaries efficiently without iterating through each row? I've shown an example of my current method and what the desired output should be below.

#!/usr/bin/python
import numpy as np
import pandas as pd

tmp_array = np.array([['AAA', 86880690, 86914111, '22RV1', 2, 2, 'H', '-'], ['ABA', 86880690, 86914111, 'A549', 2, 2, 'L', '-'], ['AAC', 86880690, 86914111, 'BFTC-905', 3, 3, 'H', '-'], ['AAB', 86880690, 86914111, 'BT-20', 2, 2, 'H', '-'], ['AAA', 86880690, 86914111, 'C32', 2, 2, 'H', '-']])

DF = pd.DataFrame(tmp_array,columns=["a,b,c,d,e,g,h,i".split(",")])

#print(DF)
a         b         c         d  e  g  h  i
0  AAA  86880690  86914111     22RV1  2  2  H  -
1  ABA  86880690  86914111      A549  2  2  L  -
2  AAC  86880690  86914111  BFTC-905  3  3  H  -
3  AAB  86880690  86914111     BT-20  2  2  H  -
4  AAA  86880690  86914111       C32  2  2  H  -

from collections import defaultdict
from itertools import izip

D_a_d_e = defaultdict(dict)
for a,d,e in izip(DF["a"],DF["d"],DF["e"]):
    D_a_d_e[a][d] = e

#print(D_a_d_e)
#ignore the defaultdict part

defaultdict(<type 'dict'>, {'ABA': {'A549': '2'}, 'AAA': {'22RV1': '2', 'C32': '2'}, 'AAC': {'BFTC-905': '3'}, 'AAB': {'BT-20': '2'}})

I saw this https://stackoverflow.com/questions/28820254/how-to-create-a-pandas-dataframe-using-a-dictionary-in-a-single-column but it was a little different and it also doesn't have an answer.

280

asked Nov 12 '15 22:11

O.rka

1 Answers

There's a to_dict method:

In [11]: DF.to_dict()
Out[11]:
{'a': {0: 'AAA', 1: 'ABA', 2: 'AAC', 3: 'AAB', 4: 'AAA'},
 'b': {0: '86880690', 1: '86880690', 2: '86880690' 3: '86880690', 4: '86880690'},
 'c': {0: '86914111', 1: '86914111', 2: '86914111', 3: '86914111', 4: '86914111'},
 'd': {0: '22RV1', 1: 'A549', 2: 'BFTC-905', 3: 'BT-20', 4: 'C32'},
 'e': {0: '2', 1: '2', 2: '3', 3: '2', 4: '2'},
 'g': {0: '2', 1: '2', 2: '3', 3: '2', 4: '2'},
 'h': {0: 'H', 1: 'L', 2: 'H', 3: 'H', 4: 'H'},
 'i': {0: '-', 1: '-', 2: '-', 3: '-', 4: '-'}}

In [12]: DF.to_dict(orient="index")
Out[12]:
{0: {'a': 'AAA', 'b': '86880690', 'c': '86914111', 'd': '22RV1', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'},
 1: {'a': 'ABA', 'b': '86880690', 'c': '86914111', 'd': 'A549', 'e': '2', 'g': '2', 'h': 'L', 'i': '-'},
 2: {'a': 'AAC', 'b': '86880690', 'c': '86914111', 'd': 'BFTC-905', 'e': '3', 'g': '3', 'h': 'H', 'i': '-'},
 3: {'a': 'AAB', 'b': '86880690', 'c': '86914111', 'd': 'BT-20', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'},
 4: {'a': 'AAA', 'b': '86880690', 'c': '86914111', 'd': 'C32', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'}}

With that in mind you can do the groupby:

In [21]: DF.set_index("d").groupby("a")[["e"]].apply(lambda x: x["e"].to_dict())
Out[21]:
a
AAA    {'C32': '2', '22RV1': '2'}
AAB                {'BT-20': '2'}
AAC             {'BFTC-905': '3'}
ABA                 {'A549': '2'}
dtype: object

That said, you may be able to use a straight up MultiIndex instead of a dictionary of dictionaries:

In [31]: res = DF.set_index(["a", "d"])["e"]

In [32]: res
Out[32]:
a    d
AAA  22RV1       2
ABA  A549        2
AAC  BFTC-905    3
AAB  BT-20       2
AAA  C32         2
Name: e, dtype: object

It'll work much the same way:

In [33]: res["AAA"]
Out[33]:
d
22RV1    2
C32      2
Name: e, dtype: object

In [34]: res["AAA"]["22RV1"]
Out[34]: '2'

But will be a more space-efficient / you're still in pandas.

answered Oct 28 '22 02:10

Andy Hayden

Related questions
                            
                                Why do both the sre and re modules exist in Python?
                            
                                ImportError: No module named django.core.wsgi in Elastic Beanstalk
                            
                                Returning from caught "RuntimeError" always gives `None` python
                            
                                Pythonic way of converting parameters to the same standard within all methods of a class
                            
                                GeoDjango filter by distance from a model field
                            
                                pyinstaller ImportError: No module named pkg_resources
                            
                                Getting 500 INTERNAL SERVER ERROR when unittesting a (flask-restful) GET API Call
                            
                                Numpy Pyinstaller ImportError: cannot import name multiarray
                            
                                Switch to popup in python using selenium
                            
                                Multilayer-perceptron, visualizing decision boundaries (2D) in Python
                            
                                testing: compare numpy arrays while allowing a certain mismatch
                            
                                Using generator expression causes Python to hang
                            
                                input command doesn't seem to work when used with popen python
                            
                                PyQt no button.clicked.connect function?
                            
                                Masked Array: How to change symbol representing masked values [duplicate]
                            
                                Stop pip installing dependancies already installed using apt-get
                            
                                Curious behaviour of Python lists [duplicate]
                            
                                Speed up custom aggregation functions
                            
                                Kafka check queue size
                            
                                Why I don't have permissions to remove six while installing a pip package?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use Pandas to create Dictionary from column entries in DataFrame or np.array

Tags:

python

dictionary

pandas

dataframe

numpy

O.rka

People also ask

1 Answers

Andy Hayden

Recent Activity

Donate For Us