Efficient way creating a dict of dict from a pandas dataframe

Question

I have a pandas dataframe of the following structure:

d = {'I': ['A', 'B', 'C', 'D'], 'X': [ 1, 0, 3, 1], 'Y': [0, 1, 2, 1], 'Z': [1, 0, 0, 0], 'W': [3, 2, 0, 0]}
df = pd.DataFrame(data=d, columns=['I','X', 'Y', 'Z', 'W'])
df.set_index('I', inplace=True, drop=True)

I need to create a dict of dict to get data of all existing edges (indicated by nonzero values) between nodes:

{'A': {'X': {1}, 'Z': {1}, 'W': {3}}, 'B': {'Y': {1}, 'W': {2}}, 'C': {'X': {3}, 'Y': {2}}, 'D': {'Y': {1}, 'X': {1}}}

I need it to create a network graph using Networkx library and perform some calculations on it. Obviously it would be possible to loop over every cell in the data frame to do this but my data is quite large and it would be inefficient. I'm looking for some better way possibly using vectorization and/or list comprehension. I've tried list comprehension but I'm stuck and cannot make it work. Can anyone suggest a more efficient way to do this please?

Viktor Sbruev · Accepted Answer

You can do this by combining df.iterrows() with a dictionary comprehension. Although iterrows() is not truly vectorized, it's still reasonably efficient for this kind of task and cleaner than using manual nested loops. For example, you could write:

edge_dictionary = {
    node: {attribute: {weight} for attribute, weight in attributes.items() if weight != 0}
    for node, attributes in df.iterrows()
}

If your DataFrame is very large and you’re concerned about performance, another approach is to first convert it into a plain dictionary of dictionaries using df.to_dict(orient='index') and then filter out the zeros. That would look like thiss:

data_dictionary = df.to_dict(orient='index')
edge_dictionary = {
    node: {attribute: {weight} for attribute, weight in connections.items() if weight != 0}
    for node, connections in data_dict.items()
}

furas · Answer

It seems my version is similar to @VictorSbruev but his idea with converting all to dictionary seems better.

I was thinking about using .apply(function, axis=1) to run code on every row and create column with inner dictionaries

def convert(row):
    data = row.to_dict()

    # skip `0` and convert value to `set()`
    data = {key:{val} for key, val in data.items() if val != 0}  

    return data

df['networkx'] = df.apply(convert, axis=1)

to get

A    {'X': {1}, 'Z': {1}, 'W': {3}}
B              {'Y': {1}, 'W': {2}}
C              {'X': {3}, 'Y': {2}}
D              {'X': {1}, 'Y': {1}}
Name: networkx, dtype: object

And later convert this column to dictionary

result = df['networkx'].to_dict()

which gives me expected

{'A': {'X': {1}, 'Z': {1}, 'W': {3}}, 'B': {'Y': {1}, 'W': {2}}, 'C': {'X': {3}, 'Y': {2}}, 'D': {'Y': {1}, 'X': {1}}}

Full working code where I was testing different versions

import pandas as pd

d = {'I': ['A', 'B', 'C', 'D'], 'X': [ 1, 0, 3, 1], 'Y': [0, 1, 2, 1], 'Z': [1, 0, 0, 0], 'W': [3, 2, 0, 0]}
df = pd.DataFrame(data=d, columns=['I','X', 'Y', 'Z', 'W'])
df.set_index('I', inplace=True, drop=True)

# for test
expected = {'A': {'X': {1}, 'Z': {1}, 'W': {3}}, 'B': {'Y': {1}, 'W': {2}}, 'C': {'X': {3}, 'Y': {2}}, 'D': {'Y': {1}, 'X': {1}}}

print(df)

def convert(row):
    #print(row)
    data = row.to_dict()
    #data = {row.name: {key:{val} for key, val in data.items() if val != 0}} # version 1
    data = {key:{val} for key, val in data.items() if val != 0}  # version 2
    return data

df['networkx'] = df.apply(convert, axis=1)
print(df['networkx'])

#print(list(df['networkx'].items()))

#result = {name:item[name] for name,item in df['networkx'].items()}  # for version 1
#result = {name:item for name,item in df['networkx'].items()}         # for version 2
result = df['networkx'].to_dict()                                    # for version 2

print('result  :', result)
print('expected:', expected)

Efficient way creating a dict of dict from a pandas dataframe

Tags:

python

dictionary

pandas

dataframe

networkx

carpediem

2 Answers

Viktor Sbruev

furas

Recent Activity

Donate For Us

Efficient way creating a dict of dict from a pandas dataframe

Tags:

python

dictionary

pandas

dataframe

networkx

carpediem

2 Answers

Viktor Sbruev

furas

Related questions

Recent Activity

Donate For Us