I am trying to compute the average cell size on the following set of points, as seen on the picture: . The picture was generated using gnuplot:
gnuplot> plot "debug.dat" using 1:2
The points are almost aligned on a rectangular grid, but not quite. There seems to be a bias (jitter?) of say 10-15% along either X or Y. How would one compute efficiently a proper partition in tiles so that there is virtually only one point per tile, size would be expressed as (tilex, tiley). I use the word virtually since the 10-15% bias may have moved a point in another adjacent tile.
Just for reference, I have manually sorted (hopefully correct) and extracted the first 10 points:
-133920,33480
-132480,33476
-131044,33472
-129602,33467
-128162,33463
-139679,34576
-138239,34572
-136799,34568
-135359,34564
-133925,34562
Just for clarification, a valid tile as per the above description would be (1435,1060), but I am really looking for a quick automated way.
Let's do this for X coordinate only:
1) sort the X coordinates
2) look at deltas between two subsequent X coordinates. These delta will fall into two categories - either they correspond to spaces between two columns, or to spaces between crosses within the same column. Your goal is to find a threshold that will separate the long spaces from the short ones. This can be done by finding a threshold that separates the deltas into two groups whose means are the furthest apart (I think)
3) once you have the threshold, separate points into columns. A columns starts and ends with a delta corresponding to the threshold you measured previously
4) calculate average position of each detected column
5) take deltas between subsequent columns. Now, the problem is that you may get a stray point that would break your columns. Use a median to get the strays out.
6) You should have a robust estimate of your gridX
Example, using your data, looking at axis X:
-133920 -132480 -131044 -129602 -128162 -139679 -138239 -136799 -135359 -133925
Sorted + deltas:
5 1434 1436 1440 1440 1440 1440 1440 1442
Here you can see that there is a very obvious threshold between small (5) and large (1434 and up) delta. 1434 will define your space here
Split the points into columns:
-139679|-138239|-136799|-135359|-133925 -133920|-132480|-131044|-129602|-128162
1440 1440 1440 1434 5 1440 1436 1442 1440
Almost all points are alone, except the two -133925 -133920.
The average grid line positions are:
-139679 -138239 -136799 -135359 -133922.5 -132480 -131044 -129602 -128162
Sorted deltas:
1436.0 1436.5 1440.0 1440.0 1440.0 1440.0 1442.0 1442.5
Median:
1440
Which is the correct answer for your SMALL data set, IMHO.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With