Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Visualising 10 dimensional data with matplotlib

I have this kind of data :

ID    x1   x2   x3    x4    x5    x6    x7   x8   x9   x10
1   -0.18   5 -0.40 -0.26  0.53 -0.66  0.10   2 -0.20    1
2   -0.58   5 -0.52 -1.66  0.65 -0.15  0.08   3  3.03   -2
3   -0.62   5 -0.09 -0.38  0.65  0.22  0.44   4  1.49    1
4   -0.22  -3  1.64 -1.38  0.08  0.42  1.24   5 -0.34    0
5    0.00   5  1.76 -1.16  0.78  0.46  0.32   5 -0.51   -2

what's the best method for visualizing this data, i'm using matplotlib to visualizing it, and read it from csv using pandas

thanks

like image 679
dkiswanto Avatar asked Oct 29 '16 10:10

dkiswanto


People also ask

How can you visualize multidimensional data?

Considering three attributes or dimensions in the data, we can visualize them by considering a pair-wise scatter plot and introducing the notion of color or hue to separate out values in a categorical dimension. The above plot enables you to check out correlations and patterns and also compare around wine groups.

Is Seaborn better than matplotlib?

Seaborn vs matplotlib is that seaborn utilises fascinating themes, while matplotlib used for making basic graphs. Seaborn contains a few plots and patterns for data visualisation, while in matplotlib, datasets are visualised with the assistance of lines, scatter plots, pie charts, histograms, bar-graphs, etc.

Is matplotlib used for data visualization?

Matplotlib is a multi-platform data visualization library built on NumPy arrays, and designed to work with the broader SciPy stack. It was conceived by John Hunter in 2002, originally as a patch to IPython for enabling interactive MATLAB-style plotting via gnuplot from the IPython command line.


1 Answers

Visualising data in a high-dimensional space is always a difficult problem. One solution that is commonly used (and is now available in pandas) is to inspect all of the 1D and 2D projections of the data. It doesn't give you all of the information about the data, but that's impossible to visualise unless you can see in 10D! Here's an example of how to do this with pandas (version 0.7.3 upwards):

import numpy as np 
import pandas as pd
from pandas.plotting import scatter_matrix

#first make some fake data with same layout as yours
data = pd.DataFrame(np.random.randn(100, 10), columns=['x1', 'x2', 'x3',\
                    'x4','x5','x6','x7','x8','x9','x10'])

#now plot using pandas 
scatter_matrix(data, alpha=0.2, figsize=(6, 6), diagonal='kde')

This generates a plot with all of the 2D projections as scatter plots, and KDE histograms of the 1D projections:

enter image description here

I also have a pure matplotlib approach to this on my github page, which produces a very similar type of plot (it is designed for MCMC output, but is also appropriate here). Here's how you'd use it here:

import corner_plot as cp

cp.corner_plot(data.as_matrix(),axis_labels=data.columns,nbins=10,\
              figsize=(7,7),scatter=True,fontsize=10,tickfontsize=7)

enter image description here

like image 75
Angus Williams Avatar answered Oct 21 '22 04:10

Angus Williams