I'd like to create some NetworkX graphs from a simple Pandas DataFrame:
Loc 1 Loc 2 Loc 3 Loc 4 Loc 5 Loc 6 Loc 7
Foo 0 0 1 1 0 0 0
Bar 0 0 1 1 0 1 1
Baz 0 0 1 0 0 0 0
Bat 0 0 1 0 0 1 0
Quux 1 0 0 0 0 0 0
Where Foo…
is the index, and Loc 1
to Loc 7
are the columns. But converting to Numpy matrices or recarrays doesn't seem to work for generating input for nx.Graph()
. Is there a standard strategy for achieving this? I'm not averse the reformatting the data in Pandas --> dumping to CSV --> importing to NetworkX, but it seems as if I should be able to generate the edges from the index and the nodes from the values.
NetworkX expects a square matrix (of nodes and edges), perhaps* you want to pass it:
In [11]: df2 = pd.concat([df, df.T]).fillna(0)
Note: It's important that the index and columns are in the same order!
In [12]: df2 = df2.reindex(df2.columns)
In [13]: df2
Out[13]:
Bar Bat Baz Foo Loc 1 Loc 2 Loc 3 Loc 4 Loc 5 Loc 6 Loc 7 Quux
Bar 0 0 0 0 0 0 1 1 0 1 1 0
Bat 0 0 0 0 0 0 1 0 0 1 0 0
Baz 0 0 0 0 0 0 1 0 0 0 0 0
Foo 0 0 0 0 0 0 1 1 0 0 0 0
Loc 1 0 0 0 0 0 0 0 0 0 0 0 1
Loc 2 0 0 0 0 0 0 0 0 0 0 0 0
Loc 3 1 1 1 1 0 0 0 0 0 0 0 0
Loc 4 1 0 0 1 0 0 0 0 0 0 0 0
Loc 5 0 0 0 0 0 0 0 0 0 0 0 0
Loc 6 1 1 0 0 0 0 0 0 0 0 0 0
Loc 7 1 0 0 0 0 0 0 0 0 0 0 0
Quux 0 0 0 0 1 0 0 0 0 0 0 0
In[14]: graph = nx.from_numpy_matrix(df2.values)
This doesn't pass the column/index names to the graph, if you wanted to do that you could use relabel_nodes
(you may have to be wary of duplicates, which are allowed in pandas' DataFrames):
In [15]: graph = nx.relabel_nodes(graph, dict(enumerate(df2.columns))) # is there nicer way than dict . enumerate ?
*It's unclear exactly what the columns and index represent for the desired graph.
A little late answer, but now networkx can read data from pandas dataframes, in that case ideally the format is the following for a simple directed graph:
+----------+---------+---------+
| Source | Target | Weight |
+==========+=========+=========+
| Node_1 | Node_2 | 0.2 |
+----------+---------+---------+
| Node_2 | Node_1 | 0.6 |
+----------+---------+---------+
If you are using adjacency matrixes then Andy Hayden is right, you should take care of the correct format. Since in your question you used 0 and 1, I guess you would like to see an undirected graph. It may seem counterintuitive first since you said Index represents e.g. a person, and columns represent groups to which a given person belongs, but it's correct also in the other way a group (membership) belongs to a person. Following this logic, you should actually put the groups in indexes and the persons in columns too.
Just a side note: You can also define this problem in the sense of a directed graph, for example you would like to visualize an association network of hierarchical categories. There, the association e.g. from Samwise Gamgee to Hobbits is stronger than in the other direction usually (since Frodo Baggins is more likely the Hobbit prototype)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With