Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating a user's importance or 'Betweenness Centrality' from a user's followers?

I want to know how I can find interesting relationships between users accounts such as the most connected, or most valuable users based on their connections to others.

Below I have the two tables I use. One has all the users, the other has the keys of the users they follow.

User
{
    id,
    name
}

Follows {
    user_id -> user.id,
    following_id -> user.id
}

What type of algorithms am I looking for?

Assuming unimportant people have little or no followers, how can I find the people in the center of the graph? I would assume they would be important because they have important people following them.

Update

As David and Steve point out, how close given nodes are, what nodes form sub communities, and which users are the most connected are all examples of useful data that can be pulled from this schema.

Since this "follower" design is used by many sites now, I've started a bounty in the hopes of getting some solid SQL or programming language implementations that might be useful to a wide variety of people.

It's worth noting that while the results of some algorithms are fascinating, others (such as finding related nodes) would have worth to the users of our sites as we can recommend things to them.

like image 493
Xeoncross Avatar asked Jan 14 '12 02:01

Xeoncross


1 Answers

If you only concentrate on the links, try these popular centrality measures (assume G is the graph):

  1. Degree: Degree of node i is defined as ki/(N-1), where ki is the number of links to node i and N is the total number of nodes. Higher degree means important.
  2. Closeness: Closeness of node i is defined as (N-1)/(Σ_(j∈G) dij), where dij is the distance between node i and node j. This emphasizes on the distances of a node to all others nodes in the social network.
  3. Betweenness: Betweenness defined as (Σ_(j<k∈G) njk(i) / njk) / ((N-1)(N-2)), where njk denotes the number of shortest paths between nodes j and k, and njk(i) is the number of these paths running through node i. Betweenness of node i is higher means node i may be a good center that there are many connections between any other two nodes need to pass through node i.

Above measures can be easily calculated by only the link information, and you can use one or combine more of these centrality measures to find out the important node(s) in the social network. Anyway, according to the definition of "important", you may need other different measures.

like image 77
Ddavid Avatar answered Sep 21 '22 00:09

Ddavid