Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NetworkX Key Error when writing GML file

I am getting the following error message when attempting to write GML file after merging two graphs using compose():

NetworkXError: 'user_id' is not a valid key

The background is that I import two GML files using:

g = nx.read_gml(file_path + "test_graph_1.gml")
h = nx.read_gml(file_path + "test_graph_2.gml")

Each node (in both GML files) file is structured thus:

node [
id 9
user_id "1663413990"
file "wingsscotland.dat"
label "brian_bilston"
image "/Users/ian/development/gtf/gtf/img/1663413990.jpg"
type "friends"
statuses 21085
friends 737
followers 53425
listed 550
ffr 72.4898
lfr 0.1029
shape "triangle-up"
]

After importing each file, I can check all the node attributes, see that nodes are unique within each graph.

I also see that NetworkX by default discards the 'id' field, und uses the 'label' as the identifier of the node. It retains the user_id attribute (which happens to be a Twitter user_id and suits my purposes well).

Running

list(f.nodes(data=True))

I can see that the data for the node above is:

('brian_bilston',
{'ffr': 72.4898,
'file': 'wingsscotland.dat',
'followers': 53425,
'friends': 737,
'image': '/Users/ian/development/gtf/gtf/img/1663413990.jpg',
'lfr': 0.1029,
'listed': 550,
'shape': 'triangle-up',
'statuses': 21085,
'type': 'friends',
'user_id': '1663413990'})

There is (in this test case) one common node shared by Graph g and Graph h, - the one shown above. All others are unique by user_id and label.

I then merge the two graphs using:

f = nx.compose(g,h)

This works ok.

I then go to write out a new GML from the graph, f, using:

nx.write_gml(f, file_path + "one_plus_two.gml")

At this point I get the error, above:

  NetworkXError: 'user_id' is not a valid key

I have checked the uniqueness of all user_id's (in case I had duplicated one):

uid = nx.get_node_attributes(f,'user_id')
print(uid)

Which outputs:

{'brian_bilston': '1663413990', 
'ICMResearch': '100', 
'justcswilliams': '200', 
'MissBabington': '300', 
'ProBirdRights': '400', 
'FredSmith': '247775851', 
'JasWatt': '160952087', 
'Angela_Lewis': '2316946782', 
'Fuzzpig54': '130136162', 
'SonnyRussel': '828881340', 
'JohnBird': '448476934', 
'AngusMcAngus': '19785044'}

(formatted for readability).

So, all user_id's are unique, as far as I can tell.

So, if it is not a question of uniqueness of keys, what is the error telling me?

I've exhausted my thinking on this!

Any pointers, please, would be very much appreciated!

like image 690
Watty62 Avatar asked Jul 31 '18 11:07

Watty62


1 Answers

I posted this as an issue on the NextworkX GitHub repo, where it was answered by an admin.

See: https://github.com/networkx/networkx/issues/3100

I have posted his answer below:

Yes -- this is a known issue: see #2131

The GML spec doesn't allow underscores in attribute names. We allow reading .gml files that don't correspond to the official GML spec. But we write only items that follow the spec. You should convert your attribute names to not include the underscore.

for n in G:
    G.node[n]['userid'] = G.node[n]['user_id']
    del G.node[n]['user_id']

We should also add to the documentation a note about this.

like image 113
Watty62 Avatar answered Sep 20 '22 03:09

Watty62