Using NetworkX, and new to the library, for a social network analysis query. By Query, I mean select/create subgraphs by attributes of both edges nodes where the edges create a path, and nodes contain attributes. The graph is using a MultiDiGraph of the form
G2 = nx.MultiDiGraph() G2.add_node( "UserA", { "type" :"Cat" } ) G2.add_node( "UserB", { "type" :"Dog" } ) G2.add_node( "UserC", { "type" :"Mouse" } ) G2.add_node( "Likes", { "type" :"Feeling" } ) G2.add_node( "Hates", { "type" :"Feeling" } ) G2.add_edge( "UserA", 'Hates' , statementid="1" ) G2.add_edge( "Hates", 'UserB' , statementid="1" ) G2.add_edge( "UserC", 'Hates' , statementid="2" ) G2.add_edge( "Hates", 'UserA' , statementid="2" ) G2.add_edge( "UserB", 'Hates' , statementid="3" ) G2.add_edge( "Hates", 'UserA' , statementid="3" ) G2.add_edge( "UserC", 'Likes' , statementid="3" ) G2.add_edge( "Likes", 'UserB' , statementid="3" )
Queried with
for node,data in G2.nodes_iter(data=True): if ( data['type'] == "Cat" ): # get all edges out from these nodes #then recursively follow using a filter for a specific statement_id #or get all edges with a specific statement id # look for with a node attribute of "cat"
Is there a better way to query? Or is it best practice to create custom iterations to create subgraphs?
Alternatively (and a separate question), the Graph could be simplified, but I'm not using the below graph because the "hates" type objects will have predcessors. Would this make querying simpler? Seems easier to iterate over nodes
G3 = nx.MultiDiGraph() G3.add_node( "UserA", { "type" :"Cat" } ) G3.add_node( "UserB", { "type" :"Dog" } ) G3.add_edge( "UserA", 'UserB' , statementid="1" , label="hates") G3.add_edge( "UserA", 'UserB' , statementid="2" , label="hates")
Other notes:
add_path
adds an identifier to the path created? g.vs.select()
NX is certainly capable of handling graphs that large, however, performance will largely be a function of your hardware setup. Aric will likely give a better answer, but NX loads graphs into memory at once, so in the ranges your are describing you will need a substantial amount of free memory for it to work.
In NetworkX, nodes can be any hashable object e.g., a text string, an image, an XML object, another Graph, a customized node object, etc. Python's None object is not allowed to be used as a node.
To check if the graph is directed you can use nx. is_directed(G) , you can find the documentation here. 'weight' in G[1][2] # Returns true if an attribute called weight exists in the edge connecting nodes 1 and 2.
It's pretty straightforward to write a one-liner to make a list or generator of nodes with a specific property (generators shown here)
import networkx as nx G = nx.Graph() G.add_node(1, label='one') G.add_node(2, label='fish') G.add_node(3, label='two') G.add_node(4, label='fish') # method 1 fish = (n for n in G if G.node[n]['label']=='fish') # method 2 fish2 = (n for n,d in G.nodes(data=True) if d['label']=='fish') print(list(fish)) print(list(fish2)) G.add_edge(1,2,color='red') G.add_edge(2,3,color='blue') red = ((u,v) for u,v,d in G.edges(data=True) if d['color']=='red') print(list(red))
If your graph is large and fixed and you want to do fast lookups you could make a "reverse dictionary" of the attributes like this,
labels = {} for n, d in G.nodes(data=True): l = d['label'] labels[l] = labels.get(l, []) labels[l].append(n) print labels
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With