Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate python 2D scatter points occupied area

I have used matplotlib to plot those two series of 2000 points. From the pictures, it seems the first 2000 points occupied area is smaller than second 2000 points. But if I want to quantitatively calculated how much area first and second serials of 2000 points occupied, how can I do?

enter image description here enter image description here

I really appreciate any help, suggestions or comments.

Thanks very much.

like image 884
misakayu Avatar asked Feb 07 '23 14:02

misakayu


1 Answers

This problem is not related to matplotlib, and also requires a definition of the "occupied area", which may vary depending on the kind of data you have. If you want a kind of non-strict approximation, here's one way to do it:

First, some test data:

import matplotlib
import matplotlib.pyplot as plt

import numpy


x = numpy.random.normal(size=10000)
y = numpy.random.normal(size=10000)

fig = plt.figure()
s = fig.add_subplot(1, 1, 1, aspect=1)
s.set_xlim(-4, 4)
s.set_ylim(-4, 4)
s.scatter(x, y)
fig.savefig('t1.png')

enter image description here

Calculate a 2D histogram to estimate the density of points. Note: the number of bins and the range is something you will have to adjust for your data.

hist, xedges, yedges = numpy.histogram2d(x, y, bins=20, range=[[-4, 4], [-4, 4]])

fig = plt.figure()
s = fig.add_subplot(1, 1, 1)
s.set_xlim(-4, 4)
s.set_ylim(-4, 4)
s.imshow(
    hist, interpolation='nearest',
    extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]],
    cmap=matplotlib.cm.viridis)
fig.savefig('t2.png')

enter image description here

Finally, find the places where the number of counts is greater than some predefined value. Note: you will have to adjust this threshold too, to get the desired distinction between "occupied" and "non-occupied" areas:

over_threshold = hist > 10

fig = plt.figure()
s = fig.add_subplot(1, 1, 1)
s.set_xlim(-4, 4)
s.set_ylim(-4, 4)
s.imshow(
    over_threshold, interpolation='nearest',
    extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]],
    cmap=matplotlib.cm.viridis)
fig.savefig('t3.png')

area = over_threshold.sum() * (xedges[1] - xedges[0]) * (yedges[1] - yedges[0])
print(area)

enter image description here

All the plotting, of course, is purely illustrative, and is not essential to the algorithm.

like image 60
fjarri Avatar answered Feb 13 '23 02:02

fjarri