Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recommendations for using graphs theory in machine learning? [closed]

I have been learning alot about using graphs for machine learning by watching Christopher Bishops videos( http://videolectures.net/mlss04_bishop_gmvm/ ). I find it very interesting and watched a few others in the same categories(machine learning/graph) but was wondering if anyone had any recommendations for ways of learning more?

My problem is, although the videos gave a great high level understanding, I don't have much practical skills in it yet. I've read Bishops book on machine learning/patterns as well as Norvig's AI book but both don't seem to touch upon specific using graphs much. With the emergence of search engines and social networking, I would think machine learning on graphs would be popular.

If possible, can anyone suggestion an a resource to learn from? (I'm new to this field and development is a hobby for me, so I'm sorry in advance if there's a super obvious resource to learn from..I tried google and university sites).

Thanks in advance!

like image 969
Lostsoul Avatar asked Dec 12 '11 15:12

Lostsoul


People also ask

Do you need graph theory for machine learning?

The two prerequisites needed to understand Graph Learning is in the name itself; Graph Theory and Deep Learning. This is all you need to know to understand the nature of, and build a high-level intuition for these two ideas.

Why do CNNS fail to work with graphs?

It's very difficult to perform CNN on graphs because of the arbitrary size of the graph, and the complex topology, which means there is no spatial locality. There's also unfixed node ordering.


2 Answers

First, i would strongly recommend the book Social Network Analysis for Startups by Maksim Tsvetovat and Alexander Kouznetsov. A book like this is a godsend for programmers who need to quickly acquire a basic fluency in a specific discipline (in this case, graph theory) so that they can begin writing code to solve problems in this domain. Both authors are academically trained graph theoreticians but the intended audience of their book is programmers. Nearly all of the numerous examples presented in the book are in python using the networkx library.

Second, for the projects you have in mind, two kinds of libraries are very helpful if not indispensible:

  • graph analysis: e.g., the excellent networkx (python), or igraph (python, R, et. al.) are two that i can recommend highly; and

  • graph rendering: the excellent graphViz, which can be used stand-alone from the command line but more likely you will want to use it as a library; there are graphViz bindings in all major languages (e.g., for python there are at least three i know of, though pygraphviz is my preference; for R there is rgraphviz which is part of the bioconductor package suite). Rgraphviz has excellent documentation (see in particular the Vignette included with the Package).

It is very easy to install and begin experimenting with these libraries and in particular using them

  • to learn the essential graph theoretic lexicon and units of analysis (e.g., degree sequence distribution, nodes traversal, graph operators);

  • to distinguish critical nodes in a graph (e.g., degree centrality, eigenvector centrality, assortivity); and

  • to identify prototype graph substructures (e.g., bipartite structure, triangles, cycles, cliques, clusters, communities, and cores).

The value of using a graph-analysis library to quickly understand these essential elements of graph theory is that for the most part there is a 1:1 mapping between the concepts i just mentioned and functions in the (networkx or igraph) library.

So e.g., you can quickly generate two random graphs of equal size (node number), render and then view them, then easily calculate for instance the average degree sequence or betweenness centrality for both and observer first-hand how changes in the value of those parameters affects the structure of a graph.

W/r/t the combination of ML and Graph Theoretic techniques, here's my limited personal experience. I use ML in my day-to-day work and graph theory less often, but rarely together. This is just an empirical observation limited to my personal experience, so the fact that i haven't found a problem in which it has seemed natural to combine techniques in these two domains. Most often graph theoretic analysis is useful in ML's blind spot, which is the availability of a substantial amount of labeled training data--supervised ML techniques depend heavily on this.

One class of problems to illustrate this point is online fraud detection/prediction. It's almost never possible to gather data (e.g., sets of online transactions attributed to a particular user) that you can with reasonable certainty separate and label as "fraudulent account." If they were particularly clever and effective then you will mislabel as "legitimate" and for those accounts for which fraud was suspected, quite often the first-level diagnostics (e.g., additional id verification or an increased waiting period to cash-out) are often enough to cause them to cease further activity (which would allow for a definite classification). Finally, even if you somehow manage to gather a reasonably noise-free data set for training your ML algorithm, it will certainly be seriously unbalanced (i.e., much more "legitimate" than "fraud" data points); this problem can be managed with statistics pre-processing (resampling) and by algorithm tuning (weighting) but it's still a problem that will likely degrade the quality of your results.

So while i have never been able to successfully use ML techniques for these types of problems, in at least two instances, i have used graph theory with some success--in the most recent instance, by applying a model adapted from the project by a group at Carnegie Mellon initially directed to detection of online auction fraud on ebay.

like image 186
doug Avatar answered Nov 15 '22 22:11

doug


MacArthur Genius Grant recipient and Stanford Professor Daphne Koller co-authored a definitive textbook on Bayesian networks entitled Probabalistic Graphical Models, which contains a rigorous introduction to graph theory as applied to AI. It may not exactly match what you're looking for, but in its field it is very highly regarded.

like image 28
templatetypedef Avatar answered Nov 15 '22 22:11

templatetypedef