Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot contours for the densest region of a scatter plot

I am generating a scatter plot of ~300k data points and am having the issue that it is so over-crowded in some places that no structure is visible - So I had a thought!

I want to have the plot generate a contour plot for the densest parts and leave the less-dense areas with the scatter() data points.

So I was trying to individually compute a nearest-neighbour distance for each of the data points and then when this distance hit a specific value, draw a contour and fill it, then when it hit a much larger value (less dense) just do the scatter...

I have been trying and failing for a few days now, I am not sure that the conventional contour plot will work in this case.

I would supply code but it is so messy and would probably just confuse the issue. And it is so computationally intensive that it would probably just crash my pc if it did work!

Thank you all in advance!

p.s. I have been searching and searching for an answer! I am convinced it is not even possible for all the results it turned up!

Edit: So the idea of this is to see where some particular points lie within the structure of the 300k sample. Here is an example plot, my points are scattered in three diff. colours. My scatter version of the data

I will attempt to randomly sample 1000 datapoints from my data and upload it as a text file. Cheers Stackers. :)

Edit: Hey, Here are some sample data 1000 lines - just two columns [X,Y] (or [g-i,i] from plot above) space delimited. Thank you all! the data

like image 358
FriskyGrub Avatar asked Oct 11 '13 06:10

FriskyGrub


People also ask

What is density contour plot?

A 2D histogram contour plot, also known as a density contour plot, is a 2-dimensional generalization of a histogram which resembles a contour plot but is computed by grouping a set of points specified by their x and y coordinates into bins, and applying an aggregation function such as count or sum (if z is provided) to ...

How do you find the density of a scatter plot?

By binning the numerical data on the X-axis and the Y-axis, setting Marker by to (None), and letting colors reflect how many data rows the tiled markers represent, you get the scatter plot below that indicates the density.

What are contour plots used for?

Use contour plots to display the relationship between two independent variables and a dependent variable. The graph shows values of the Z variable for combinations of the X and Y variables. The X and Y values are displayed along the X and Y-axes, while contour lines and bands represent the Z value.

What is a contour plot data?

A contour plot is a graphical technique for representing a 3-dimensional surface by plotting constant z slices, called contours, on a 2-dimensional format. That is, given a value for z, lines are drawn for connecting the (x,y) coordinates where that z value occurs.


2 Answers

4 years later and I can finally answer this! this can be done using contains_points from matplotlib.path.

I've used a Gaussian smoothing from astropy which can be omitted or substituted as needed.

import matplotlib.colors as colors
from matplotlib import path
import numpy as np
from matplotlib import pyplot as plt
try:
    from astropy.convolution import Gaussian2DKernel, convolve
    astro_smooth = True
except ImportError as IE:
    astro_smooth = False

np.random.seed(123)
t = np.linspace(-1,1.2,2000)
x = (t**2)+(0.3*np.random.randn(2000))
y = (t**5)+(0.5*np.random.randn(2000))

H, xedges, yedges = np.histogram2d(x,y, bins=(50,40))
xmesh, ymesh = np.meshgrid(xedges[:-1], yedges[:-1])

# Smooth the contours (if astropy is installed)
if astro_smooth:
    kernel = Gaussian2DKernel(stddev=1.)
    H=convolve(H,kernel)

fig,ax = plt.subplots(1, figsize=(7,6)) 
clevels = ax.contour(xmesh,ymesh,H.T,lw=.9,cmap='winter')#,zorder=90)

# Identify points within contours
p = clevels.collections[0].get_paths()
inside = np.full_like(x,False,dtype=bool)
for level in p:
    inside |= level.contains_points(zip(*(x,y)))

ax.plot(x[~inside],y[~inside],'kx')
plt.show(block=False)

enter image description here

like image 172
FriskyGrub Avatar answered Oct 12 '22 15:10

FriskyGrub


You can achieve this with a variety of numpy/scipy/matplotlib tools:

  1. Create a scipy.spatial.KDTree of the original points for fast lookup.
  2. Use np.meshgrid to create a grid of points at the resolution you want the contour
  3. Use KDTree.query to create a mask of all locations that are within the target density
  4. Bin the data, either with a rectangular bin or plt.hexbin.
  5. Plot the contour from the binned data, but use the mask from step 3. to filter out the lower density regions.
  6. Use the inverse of the mask to plt.scatter the remaining points.
like image 33
Hooked Avatar answered Oct 12 '22 15:10

Hooked