Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Networkx Multigraph from_pandas_dataframe

Update:
The question, as written, is relevant to Networkx version < 2.0. The from_pandas_dataframe method has been dropped.
To accomplish the same task in Networkx >= 2.0, see the update to the accepted answer.

Trying to create a MultiGraph() instance from a pandas DataFrame using networkx's from_pandas_dataframe. What am I doing wrong in the example below?

In [1]: import pandas as pd
        import networkx as nx

        df = pd.DataFrame([['geneA', 'geneB', 0.05, 'method1'],
                           ['geneA', 'geneC', 0.45, 'method1'],
                           ['geneA', 'geneD', 0.35, 'method1'],
                           ['geneA', 'geneB', 0.45, 'method2']], 
                           columns = ['gene1','gene2','conf','type'])

First try with the default nx.Graph():

In [2]: G= nx.from_pandas_dataframe(df, 'gene1', 'gene2', edge_attr=['conf','type'], 
                                    create_using=nx.Graph())

As a non-MultiGraph(), I'm missing one of the duplicate edges:

In [3]: G.edges(data=True)
Out[3]: [('geneA', 'geneB', {'conf': 0.45, 'type': 'method2'}),
         ('geneA', 'geneC', {'conf': 0.45, 'type': 'method1'}),
         ('geneA', 'geneD', {'conf': 0.35, 'type': 'method1'})]

With MultiGraph():

In [4]: MG= nx.from_pandas_dataframe(df, 'gene1', 'gene2', edge_attr=['conf','type'], 
                             create_using=nx.MultiGraph())

This:

TypeError                                 Traceback (most recent call last)
<ipython-input-49-d2c7b8312ea7> in <module>()
----> 1 MG= nx.from_pandas_dataframe(df, 'gene1', 'gene2', ['conf','type'], create_using=nx.MultiGraph())

/usr/lib/python2.7/site-packages/networkx-1.10-py2.7.egg/networkx/convert_matrix.pyc in from_pandas_dataframe(df, source, target, edge_attr, create_using)
    209         # Iteration on values returns the rows as Numpy arrays
    210         for row in df.values:
--> 211             g.add_edge(row[src_i], row[tar_i], {i:row[j] for i, j in edge_i})
    212 
    213     # If no column names are given, then just return the edges.

/usr/lib/python2.7/site-packages/networkx-1.10-py2.7.egg/networkx/classes/multigraph.pyc in add_edge(self, u, v, key, attr_dict, **attr)
    340             datadict.update(attr_dict)
    341             keydict = self.edge_key_dict_factory()
--> 342             keydict[key] = datadict
    343             self.adj[u][v] = keydict
    344             self.adj[v][u] = keydict

TypeError: unhashable type: 'dict'

Question How do I instantiate a MultiGraph() from a pandas dataframe?

like image 780
Kevin Avatar asked Feb 04 '16 20:02

Kevin


2 Answers

Networkx < 2.0:
It's was a bug, I opened an issue on GitHub, once I made the suggested edit:

It changed line 211 of convert_matrix.py to to read:

g.add_edge(row[src_i], row[tar_i], attr_dict={i:row[j] for i, j in edge_i})

Results from that change: (which have since been incorporated)

MG= nx.from_pandas_dataframe(df, 'gene1', 'gene2', edge_attr=['conf','type'], 
                                 create_using=nx.MultiGraph())

MG.edges(data=True)
[('geneA', 'geneB', {'conf': 0.05, 'type': 'method1'}),
         ('geneA', 'geneB', {'conf': 0.45, 'type': 'method2'}),
         ('geneA', 'geneC', {'conf': 0.45, 'type': 'method1'}),
         ('geneA', 'geneD', {'conf': 0.35, 'type': 'method1'})]

Networkx >= 2.0:
In DataFrames with this format (edge list), use from_pandas_edgelist

MG= nx.from_pandas_edgelist(df, 'gene1', 'gene2', edge_attr=['conf','type'], 
                             create_using=nx.MultiGraph())

MG.edges(data=True)
MultiEdgeDataView([('geneA', 'geneB', {'conf': 0.05, 'type': 'method1'}),
                   ('geneA', 'geneB', {'conf': 0.45, 'type': 'method2'}),
                   ('geneA', 'geneC', {'conf': 0.45, 'type': 'method1'}), 
                   ('geneA', 'geneD', {'conf': 0.35, 'type': 'method1'})])
like image 80
Kevin Avatar answered Sep 30 '22 12:09

Kevin


That's a nice question. I tried to reproduce your problem building your MultiGraph() in a different way, using only three/four columns with:

MG = nx.MultiGraph()

MG.add_weighted_edges_from([tuple(d) for d in df[['gene1','gene2','conf']].values])

this correctly returns as MG.edges(data=True):

[('geneA', 'geneB', {'weight': 0.05}), ('geneA', 'geneB', {'weight': 0.45}), ('geneA', 'geneC', {'weight': 0.45}), ('geneA', 'geneD', {'weight': 0.35})]

I tried also with your from_pandas_dataframe method using only three columns but it doesn't work:

MG = nx.from_pandas_dataframe(df, 'gene1', 'gene2', edge_attr='conf', create_using=nx.MultiGraph())

this returns the same error you encountered. I don't know if it is a bug or that method doesn't support more than one weight type for MultiGraph(). In the meantime you can use the above workaround to build your MultiGraph, at least with only one weight type. Hope that helps.

like image 28
Fabio Lamanna Avatar answered Sep 30 '22 10:09

Fabio Lamanna