Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pseudo-code for Network-only-bayes-classifier

I am trying to implement a classification toolkit for univariate network data using igraph and python.

However, my question is actually more of an algorithms question in relational classification area instead of programming.

I am following Classification in Networked Data paper.

I am having a difficulty to understand what this paper refers to "Network-Only Bayes Classifier"(NBC) which is one of the relational classifiers explained in the paper.

I implemented Naive Bayes classifier for text data using bag of words feature representation earlier. And the idea of Naive Bayes on text data is clear on my mind.

I think this method (NBC) is a simple translation of the same idea to the relational classification area. However, I am confused with the notation used in the equations, so I couldn't figure out what is going on. I also have a question on the notation used in the paper here.

NBC is explained in page 14 on the paper,

enter image description here

Summary:

I need the pseudo-code of the "Network-Only Bayes Classifier"(NBC) explained in the paper, page 14.

Pseudo-code notation:

  1. Let's call vs the list of vertices in the graph. len(vs) is the length. vs[i] is the ith vertex.
  2. Let's assume we have a univariate and binary scenario, i.e., vs[i].class is either 0 or 1 and there is no other given feature of a node.
  3. Let's assume we run a local classifier before so that every node has an initial label, which are calculated by the local classifier. I am only interested in with the relational classifier part.
  4. Let's call v the vertex we are trying to predict, and v.neighbors() is the list of vertices which are neighbors of v.
  5. Let's assume all the edge weights are 1.

Now, I need the pseudo-code for:

def NBC(vs, v):
   # v.class is 0 or 1
   # v.neighbors is list of neighbor vertices
   # vs is the list of all vertices

   # This function returns 0 or 1

Edit:

To make your job easier, I did this example. I need the answer for last 2 equations.

like image 623
Sait Avatar asked Jul 02 '15 05:07

Sait


1 Answers

In words...

The probability that node x_i belongs to the class c is equal to:

  • The probability of the neighbourhood of x_i (called N_i) if x belonged indeed to the class c; Multiplied by ...
  • The probability of the class c itself; Divided by ...
  • The probability of the neighbourhood N_i (of node x_i) itself.

As far as the probability of the neighbourhood N_i (of x_i) if x were to belong to the class c is concerned, it is equal to:

  • A product of some probability; (which probability?)
  • The probability that some node (v_j) of the neighbourhood (N_i) belongs to the class c if x belonged indeed to the class c
    • (raised to the weight of the edge connecting the node that is being examined and the node that is being classified...but you are not interested in this...yet). (The notation is a bit off here I think, why do they define v_j and then never use it?...Whatever).
  • Finally, multiply the product of some probability with some 1/Z. Why? Because all ps are probabilities and therefore lie within the range of 0 to 1, but the weights w could be anything, meaning that in the end, the calculated probability could be out of range.

  • The probability that some x_i belongs to a class c GIVEN THE EVIDENCE FROM ITS NEIGHBOURHOOD, is a posterior probability. (AFTER something...What is this something? ... Please see below)

  • The probability of appearance of neighbourhood N_i if x_i belonged to the class c is the likelihood.

  • The probability of the class c itself is the prior probability. BEFORE something...What is this something? The evidence. The prior tells you the probability of the class without any evidence presented, but the posterior tells you the probability of a specific event (that x_i belongs to c) GIVEN THE EVIDENCE FROM ITS NEIGHBOURHOOD.

The prior, can be subjective. That is, derived by limited observations OR be an informed opinion. In other words, it doesn't have to be a population distribution. It only has to be accurate enough, not absolutely known.

The likelihood is a bit more challenging. Although we have here a formula, the likelihood must be estimated from a large enough population or as much "physical" knowledge about the phenomenon being observed as possible.

Within the product (capital letter Pi in the second equation that expresses the likelihood) you have a conditional. The conditional is the probability that a neighbourhood node belongs to some class if x belonged to class c.

In the typical application of the Naive Bayesian Classifier, that is document classification (e.g. spam mail), the conditional that an email is spam GIVEN THE APPEARANCE OF SPECIFIC WORDS IN ITS BODY is derived by a huge database of observations, or, a huge database of emails that we really, absolutely know which class they belong to. In other words, I must have an idea of how does a spam email looks like and eventually, the majority of spam emails converge to having some common theme (I am some bank official and i have a money opportunity for you, give me your bank details to wire money to you and make you rich...).

Without this knowledge, we can't use Bayes rule.

So, to get back to your specific problem. In your PDF, you have a question mark in the derivation of the product.

Exactly.

So the real question here is: What is the likelihood from your Graph / data?

(...or Where are you going to derive it from? (obviously, either a large number of known observations OR some knowledge about the phenomenon. For example, what is the likelihood that a node is infected given that a proportion of its neighbourhood is infected too)).

I hope this helps.

like image 64
A_A Avatar answered Oct 01 '22 14:10

A_A