Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python igraph: delete vertices from a graph

I am working with enron email dataset and I am trying to remove email addresses that don't have "@enron.com" (i.e. I would like to have enron emails only). When I tried to delete those addresses without @enron.com, some emails just got skipped for some reasons. A small graph is shown below where vertices are email address. This is gml format:

Creator "igraph version 0.7 Sun Mar 29 20:15:45 2015"
Version 1
graph
[
  directed 1
  node
  [
    id 0
    label "[email protected]"
  ]
  node
  [
    id 1
    label "[email protected]"
  ]
  node
  [
    id 2
    label "[email protected]"
  ]
  node
  [
    id 3
    label "[email protected]"
  ]
  node
  [
    id 4
    label "[email protected]"
  ]
  node
  [
    id 5
    label "[email protected]"
  ]
  node
  [
    id 6
    label "[email protected]"
  ]
  node
  [
    id 7
    label "[email protected]"
  ]
  node
  [
    id 8
    label "[email protected]"
  ]
  node
  [
    id 9
    label "[email protected]"
  ]
  edge
  [
    source 5
    target 5
    weight 1
  ]
]

My code is:

G = ig.read("enron_email_filtered.gml")
for v in G.vs:
    print v['label']
    if '@enron.com' not in v['label']:
        G.delete_vertices(v.index)
        print 'Deleted'

In this dataset, 7 emails should be deleted. However, based on the above code, only 5 emails are removed.

like image 283
user1894963 Avatar asked Mar 29 '15 17:03

user1894963


People also ask

Which is better Igraph or NetworkX?

NetworkX is pure Python, well documented and handles changes to the network gracefully. iGraph is more performant in terms of speed and ram usage but less flexible for dynamic networks. iGraph is a C library with very smart indexing and storage approaches so you can load pretty large graphs in ram.

How do I install Igraph?

The simplest way to install the igraph R package is typing install. packages("igraph") in your R session. If you want to download the package manually, the following link leads you to the page of the latest release on CRAN where you can pick the appropriate source or binary distribution yourself.


1 Answers

From the tutorial here, you can access all the vertices with a specific property, and then delete them in bulk as follows:

to_delete_ids = [v.index for v in G.vs if '@enron.com' not in v['label']]
G.delete_vertices(to_delete_ids)

Here is the output I got:

to delete ids: [1, 3, 4, 5, 7, 8, 9]
Before deletion: IGRAPH D-W- 10 1 --
+ attr: id (v), label (v), weight (e)
+ edges:
5->5
After deletion: IGRAPH D-W- 3 0 --
+ attr: id (v), label (v), weight (e)
label: [email protected]
label: [email protected]
label: [email protected]
like image 124
Jey Avatar answered Sep 28 '22 12:09

Jey