I am new using Python for working with graphs: NetworkX. Until now I have used Gephi. There the standard steps (but not the only possible) are:
Load the nodes informations from a table/spreadsheet; one of the columns should be ID and the rest are metadata about the nodes (nodes are people, so gender, groups... normally to be used for coloring). Like:
id;NormalizedName;Gender
per1;Jesús;male
per2;Abraham;male
per3;Isaac;male
per4;Jacob;male
per5;Judá;male
per6;Tamar;female
...
Then load the edges also from a table/spreadsheet, using the same names for the nodes as it was in the column ID of the nodes spreadsheet with normally four columns (Target, Source, Weight and Type):
Target;Source;Weight;Type
per1;per2;3;Undirected
per3;per4;2;Undirected
...
This are the two dataframes that I have and that I want to load in Python. Reading about NetworkX, it seems that it's not quite possible to load two tables (one for nodes, one for edges) into the same graph and I am not sure what would be the best way:
Should I create a graph only with the nodes informations from the DataFrame, and then add (append) the edges from the other DataFrame? If so and since nx.from_pandas_dataframe() expects information about the edges, I guess I shouldn't use it to create the nodes... Should I just pass the information as lists?
Should I create a graph only with the edges information from the DataFrame and then add to each node the information from the other DataFrame as attributes? Is there a better way for doing that than iterating over the DataFrame and the nodes?
nbunch. An nbunch is a single node, container of nodes or None (representing all nodes). It can be a list, set, graph, etc.. To filter an nbunch so that only nodes actually in G appear, use G.
Add an edge between u and v. The nodes u and v will be automatically added if they are not already in the graph. Edge attributes can be specified with keywords or by directly accessing the edge's attribute dictionary.
In NetworkX, nodes can be any hashable object e.g., a text string, an image, an XML object, another Graph, a customized node object, etc. Python's None object is not allowed to be used as a node.
For NetworkX, a graph with more than 100K nodes may be too large. I'll demonstrate that it can handle a network with 187K nodes in this post, but the centrality calculations were prolonged. Luckily, there are some other packages available to help us with even larger graphs.
Create the weighted graph from the edge table using nx.from_pandas_dataframe
:
import networkx as nx
import pandas as pd
edges = pd.DataFrame({'source' : [0, 1],
'target' : [1, 2],
'weight' : [100, 50]})
nodes = pd.DataFrame({'node' : [0, 1, 2],
'name' : ['Foo', 'Bar', 'Baz'],
'gender' : ['M', 'F', 'M']})
G = nx.from_pandas_dataframe(edges, 'source', 'target', 'weight')
Then add the node attributes from dictionaries using set_node_attributes
:
nx.set_node_attributes(G, 'name', pd.Series(nodes.name, index=nodes.node).to_dict())
nx.set_node_attributes(G, 'gender', pd.Series(nodes.gender, index=nodes.node).to_dict())
Or iterate over the graph to add the node attributes:
for i in sorted(G.nodes()):
G.node[i]['name'] = nodes.name[i]
G.node[i]['gender'] = nodes.gender[i]
As of nx 2.0
the argument order of nx.set_node_attributes
has changed: (G, values, name=None)
Using the example from above:
nx.set_node_attributes(G, pd.Series(nodes.gender, index=nodes.node).to_dict(), 'gender')
And as of nx 2.4
, G.node[]
is replaced by G.nodes[]
.
Here's basically the same answer, but updated with some details filled in. We'll start with basically the same setup, but here there won't be indices for the nodes, just names to address @LancelotHolmes comment and make it more general:
import networkx as nx
import pandas as pd
linkData = pd.DataFrame({'source' : ['Amy', 'Bob'],
'target' : ['Bob', 'Cindy'],
'weight' : [100, 50]})
nodeData = pd.DataFrame({'name' : ['Amy', 'Bob', 'Cindy'],
'type' : ['Foo', 'Bar', 'Baz'],
'gender' : ['M', 'F', 'M']})
G = nx.from_pandas_edgelist(linkData, 'source', 'target', True, nx.DiGraph())
Here the True
parameter tells NetworkX to keep all the properties in the linkData as link properties. In this case I've made it a DiGraph
type, but if you don't need that, then you can make it another type in the obvious way.
Now, since you need to match the nodeData by the name of the nodes generated from the linkData, you need to set the index of the nodeData dataframe to be the name
property, before making it a dictionary so that NetworkX 2.x can load it as the node attributes.
nx.set_node_attributes(G, nodeData.set_index('name').to_dict('index'))
This loads the whole nodeData dataframe into a dictionary in which the key is the name, and the other properties are key:value pairs within that key (i.e., normal node properties where the node index is its name).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With