Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Topological data analysis - where to begin

I've recently come across 'topological data analysis' (TDA) as a unique way of visualizing large datasets. Here is a Stanford paper with example output towards the end https://research.math.osu.edu/tgda/mapperPBG.pdf.

I'd like to produce similar results but am having difficulty finding runnable code on the net where you install a package, load sample data, then execute a few lines (like http://scikit-learn.org/ examples). My language preference is Python but could use R as well.

Has anybody been able to get traction with TDA and if so, any advice on how to get code up and running?

like image 424
Ben Avatar asked Aug 06 '14 13:08

Ben


People also ask

What can topological data analysis do?

Topological data analysis (TDA) is a field of mathematics which deals with qualitative geometric features to analyze datasets. Simply, TDA is a collection of powerful tools that have the ability to quantify shape and structure in data to answer questions from the data's domain.

Is topology used in data science?

A Role for Topology in Data Science: The mathematical discipline of topology offers a new approach to data analysis that is especially important in today's world of complex, high-dimensional, noisy data.

Is topology used in statistics?

Topological data analysis (TDA) refers to statistical methods that find structure in data. As the name suggests, these methods make use of topological ideas. Often, the term TDA is used narrowly to describe a particular method called persistent homology (discussed in Section 4).

What is topology in big data?

Modern data science uses so-called topological methods to find the structural features of data sets before further supervised or unsupervised analysis. Geometry and topology are very natural tools for analysing massive amounts of data since geometry can be regarded as the study of distance functions.


3 Answers

There is a new r package out:

TDA: Statistical Tools for Topological Data Analysis
This package provides tools for the statistical analysis of persistent homology and for density clustering.

The very well written vignette can be found here: Introduction to the R package TDA

Abstract

We present a short tutorial and introduction to using the R package TDA, which provides some tools for Topological Data Analysis. In particular, it includes implementations of functions that, given some data, provide topological information about the underlying space, such as the distance function, the distance to a measure, the kNN density estimator, the kernel density estimator, and the kernel distance. The salient topological features of the sublevel sets (or superlevel sets) of these functions can be quantified with persistent homology. We provide an R interface for the efficient algorithms of the C++ libraries GUDHI, Dionysus and PHAT, including a function for the persistent homology of the Rips filtration, and one for the persistent homology of sublevel sets (or superlevel sets) of arbitrary functions evaluated over a grid of points. The significance of the features in the resulting persistence diagrams can be analyzed with functions that implement the methods discussed in Fasy, Lecci, Rinaldo, Wasserman, Balakrishnan, and Singh (2014), Chazal, Fasy, Lecci, Rinaldo, and Wasserman (2014c) and Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman (2014a). The R package TDA also includes the implementation of an algorithm for density clustering, which allows us to identify the spatial organization of the probability mass associated to a density function and visualize it by means of a dendrogram, the cluster tree.

like image 182
vonjd Avatar answered Oct 16 '22 16:10

vonjd


For visualization, Cytoscape has desktop and browser versions.

It suggests two python libraries (Bioconductor and igraph) here to produce.

like image 31
SerkanSerttop Avatar answered Oct 16 '22 17:10

SerkanSerttop


Dionysus is a C++ implemenation computing persistent homology. It has a nice PyBind wrapper which makes it pretty easy to experiment with it in python.

Recently Dionysus version 2 has appeared which has plotting capabilities, which should make it easier to dive into. Have a look here:

http://www.mrzv.org/software/dionysus2/tutorial/plotting.html

From a generic dataset sitting in an euclidean space (i.e. for instance 2D or 3D arrays), building a Rips complex is probably a good entry point, this is explained here:

http://www.mrzv.org/software/dionysus2/tutorial/rips.html

like image 22
Tarje Bargheer Avatar answered Oct 16 '22 15:10

Tarje Bargheer