I've recently come across 'topological data analysis' (TDA) as a unique way of visualizing large datasets. Here is a Stanford paper with example output towards the end https://research.math.osu.edu/tgda/mapperPBG.pdf.
I'd like to produce similar results but am having difficulty finding runnable code on the net where you install a package, load sample data, then execute a few lines (like http://scikit-learn.org/ examples). My language preference is Python but could use R as well.
Has anybody been able to get traction with TDA and if so, any advice on how to get code up and running?
Topological data analysis (TDA) is a field of mathematics which deals with qualitative geometric features to analyze datasets. Simply, TDA is a collection of powerful tools that have the ability to quantify shape and structure in data to answer questions from the data's domain.
A Role for Topology in Data Science: The mathematical discipline of topology offers a new approach to data analysis that is especially important in today's world of complex, high-dimensional, noisy data.
Topological data analysis (TDA) refers to statistical methods that find structure in data. As the name suggests, these methods make use of topological ideas. Often, the term TDA is used narrowly to describe a particular method called persistent homology (discussed in Section 4).
Modern data science uses so-called topological methods to find the structural features of data sets before further supervised or unsupervised analysis. Geometry and topology are very natural tools for analysing massive amounts of data since geometry can be regarded as the study of distance functions.
There is a new r package out:
TDA: Statistical Tools for Topological Data Analysis
This package provides tools for the statistical analysis of persistent homology and for density clustering.
The very well written vignette can be found here: Introduction to the R package TDA
Abstract
We present a short tutorial and introduction to using the R package TDA, which provides some tools for Topological Data Analysis. In particular, it includes implementations of functions that, given some data, provide topological information about the underlying space, such as the distance function, the distance to a measure, the kNN density estimator, the kernel density estimator, and the kernel distance. The salient topological features of the sublevel sets (or superlevel sets) of these functions can be quantified with persistent homology. We provide an R interface for the efficient algorithms of the C++ libraries GUDHI, Dionysus and PHAT, including a function for the persistent homology of the Rips filtration, and one for the persistent homology of sublevel sets (or superlevel sets) of arbitrary functions evaluated over a grid of points. The significance of the features in the resulting persistence diagrams can be analyzed with functions that implement the methods discussed in Fasy, Lecci, Rinaldo, Wasserman, Balakrishnan, and Singh (2014), Chazal, Fasy, Lecci, Rinaldo, and Wasserman (2014c) and Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman (2014a). The R package TDA also includes the implementation of an algorithm for density clustering, which allows us to identify the spatial organization of the probability mass associated to a density function and visualize it by means of a dendrogram, the cluster tree.
For visualization, Cytoscape has desktop and browser versions.
It suggests two python libraries (Bioconductor and igraph) here to produce.
Dionysus is a C++ implemenation computing persistent homology. It has a nice PyBind wrapper which makes it pretty easy to experiment with it in python.
Recently Dionysus version 2 has appeared which has plotting capabilities, which should make it easier to dive into. Have a look here:
http://www.mrzv.org/software/dionysus2/tutorial/plotting.html
From a generic dataset sitting in an euclidean space (i.e. for instance 2D or 3D arrays), building a Rips complex is probably a good entry point, this is explained here:
http://www.mrzv.org/software/dionysus2/tutorial/rips.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With