I am new using Python for working with graphs: NetworkX. Until now I have used Gephi. There the standard steps (but not the only possible) are: <ol> <li> Load the nodes informations from a table/spreadsheet; one of the columns should be ID and the rest are metadata about the nodes (nodes are people, so gender, groups... normally to be used for coloring). Like: <pre class="prettyprint"><code>id;NormalizedName;Gender per1;Jesús;male per2;Abraham;male per3;Isaac;male per4;Jacob;male per5;Judá;male per6;Tamar;female ... </code></pre> </li> <li> Then load the edges also from a table/spreadsheet, using the same names for the nodes as it was in the column ID of the nodes spreadsheet with normally four columns (Target, Source, Weight and Type): <pre class="prettyprint"><code>Target;Source;Weight;Type per1;per2;3;Undirected per3;per4;2;Undirected ... </code></pre> </li> </ol> This are the two dataframes that I have and that I want to load in Python. Reading about NetworkX, it seems that it's not quite possible to load two tables (one for nodes, one for edges) into the same graph and I am not sure what would be the best way: <ol> <li>Should I create a graph only with the nodes informations from the DataFrame, and then add (append) the edges from the other DataFrame? If so and since nx.from_pandas_dataframe() expects information about the edges, I guess I shouldn't use it to create the nodes... Should I just pass the information as lists?</li> <li>Should I create a graph only with the edges information from the DataFrame and then add to each node the information from the other DataFrame as attributes? Is there a better way for doing that than iterating over the DataFrame and the nodes?</li> </ol>

Here's basically the same answer, but updated with some details filled in. We'll start with basically the same setup, but here there won't be indices for the nodes, just names to address @LancelotHolmes comment and make it more general: <pre class="prettyprint"><code>import networkx as nx import pandas as pd linkData = pd.DataFrame({'source' : ['Amy', 'Bob'], 'target' : ['Bob', 'Cindy'], 'weight' : [100, 50]}) nodeData = pd.DataFrame({'name' : ['Amy', 'Bob', 'Cindy'], 'type' : ['Foo', 'Bar', 'Baz'], 'gender' : ['M', 'F', 'M']}) G = nx.from_pandas_edgelist(linkData, 'source', 'target', True, nx.DiGraph()) </code></pre> Here the <code>True</code> parameter tells NetworkX to keep all the properties in the linkData as link properties. In this case I've made it a <code>DiGraph</code> type, but if you don't need that, then you can make it another type in the obvious way. Now, since you need to match the nodeData by the name of the nodes generated from the linkData, you need to set the index of the nodeData dataframe to be the <code>name</code> property, before making it a dictionary so that NetworkX 2.x can load it as the node attributes. <pre class="prettyprint"><code>nx.set_node_attributes(G, nodeData.set_index('name').to_dict('index')) </code></pre> This loads the whole nodeData dataframe into a dictionary in which the key is the name, and the other properties are key:value pairs within that key (i.e., normal node properties where the node index is its name).

Load nodes with attributes and edges from DataFrame to NetworkX

Tags:

python

pandas

graph

networkx

I am new using Python for working with graphs: NetworkX. Until now I have used Gephi. There the standard steps (but not the only possible) are:

Load the nodes informations from a table/spreadsheet; one of the columns should be ID and the rest are metadata about the nodes (nodes are people, so gender, groups... normally to be used for coloring). Like:
```
id;NormalizedName;Gender
per1;Jesús;male
per2;Abraham;male
per3;Isaac;male
per4;Jacob;male
per5;Judá;male
per6;Tamar;female
...
```
Then load the edges also from a table/spreadsheet, using the same names for the nodes as it was in the column ID of the nodes spreadsheet with normally four columns (Target, Source, Weight and Type):
```
Target;Source;Weight;Type
per1;per2;3;Undirected
per3;per4;2;Undirected
...
```

This are the two dataframes that I have and that I want to load in Python. Reading about NetworkX, it seems that it's not quite possible to load two tables (one for nodes, one for edges) into the same graph and I am not sure what would be the best way:

Should I create a graph only with the nodes informations from the DataFrame, and then add (append) the edges from the other DataFrame? If so and since nx.from_pandas_dataframe() expects information about the edges, I guess I shouldn't use it to create the nodes... Should I just pass the information as lists?
Should I create a graph only with the edges information from the DataFrame and then add to each node the information from the other DataFrame as attributes? Is there a better way for doing that than iterating over the DataFrame and the nodes?

499

asked Mar 02 '17 14:03

José

2 Answers

Create the weighted graph from the edge table using nx.from_pandas_dataframe:

import networkx as nx
import pandas as pd

edges = pd.DataFrame({'source' : [0, 1],
                      'target' : [1, 2],
                      'weight' : [100, 50]})

nodes = pd.DataFrame({'node' : [0, 1, 2],
                      'name' : ['Foo', 'Bar', 'Baz'],
                      'gender' : ['M', 'F', 'M']})

G = nx.from_pandas_dataframe(edges, 'source', 'target', 'weight')

Then add the node attributes from dictionaries using set_node_attributes:

nx.set_node_attributes(G, 'name', pd.Series(nodes.name, index=nodes.node).to_dict())
nx.set_node_attributes(G, 'gender', pd.Series(nodes.gender, index=nodes.node).to_dict())

Or iterate over the graph to add the node attributes:

for i in sorted(G.nodes()):
    G.node[i]['name'] = nodes.name[i]
    G.node[i]['gender'] = nodes.gender[i]

Update:

As of nx 2.0 the argument order of nx.set_node_attributes has changed: (G, values, name=None)

Using the example from above:

nx.set_node_attributes(G, pd.Series(nodes.gender, index=nodes.node).to_dict(), 'gender')

And as of nx 2.4, G.node[] is replaced by G.nodes[].

144

answered Sep 17 '22 04:09

harryscholes

Here's basically the same answer, but updated with some details filled in. We'll start with basically the same setup, but here there won't be indices for the nodes, just names to address @LancelotHolmes comment and make it more general:

import networkx as nx
import pandas as pd

linkData = pd.DataFrame({'source' : ['Amy', 'Bob'],
                  'target' : ['Bob', 'Cindy'],
                  'weight' : [100, 50]})

nodeData = pd.DataFrame({'name' : ['Amy', 'Bob', 'Cindy'],
                  'type' : ['Foo', 'Bar', 'Baz'],
                  'gender' : ['M', 'F', 'M']})

G = nx.from_pandas_edgelist(linkData, 'source', 'target', True, nx.DiGraph())

Here the True parameter tells NetworkX to keep all the properties in the linkData as link properties. In this case I've made it a DiGraph type, but if you don't need that, then you can make it another type in the obvious way.

Now, since you need to match the nodeData by the name of the nodes generated from the linkData, you need to set the index of the nodeData dataframe to be the name property, before making it a dictionary so that NetworkX 2.x can load it as the node attributes.

nx.set_node_attributes(G, nodeData.set_index('name').to_dict('index'))

This loads the whole nodeData dataframe into a dictionary in which the key is the name, and the other properties are key:value pairs within that key (i.e., normal node properties where the node index is its name).

answered Sep 18 '22 04:09

Aaron Bramson

Related questions
                            
                                Python: Invalid Token
                            
                                Is anyone using meta-meta-classes / meta-meta-meta-classes in Python/ other languages?
                            
                                Purpose of @ symbols in Python?
                            
                                What are the best books and resources for learning to develop, deploy and/or host Django? [closed]
                            
                                How do I go about setting up a TDD development process with Google App Engine?
                            
                                using tabulation in Python logging format
                            
                                Python string.replace() not replacing characters
                            
                                make the user in a model default to the current user [duplicate]
                            
                                Why does 1.__add__(2) not work out? [duplicate]
                            
                                Creating empty spreadsheets in Google Drive using Drive API
                            
                                Why doesn't #include <Python.h> work?
                            
                                Database table names with Django
                            
                                Error no module named curses
                            
                                PyMysql UPDATE query
                            
                                Add a prefix to URL patterns
                            
                                python pandas time series year extraction
                            
                                Is it possible to upgrade a portable Python 32 bit install to a 64 bit install?
                            
                                PyCharm Error Loading Package List
                            
                                How to make seaborn.heatmap larger (normal size)?
                            
                                AttributeError: 'float' object has no attribute 'split'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With