How it goes: Based on the set of tuple (id, x, y), find the min max for x and y , then two dots (red points) created. Each element in tuple are grouped to two groups based on the distance towards the red dots.

Each group cant exceed 5 dots. If exceed, new group should be computed. I've managed to do recursion for the first phase. But I have no idea how to do it for second phase. The second phase should look like this:

Based on these two groups, again it need to find the min max for x and y (for each group), then four dots (red points) created. Each element in tuple are grouped to two groups based on the distance towards the red dots.
getDistance :: (Int, Double, Double) -> (Int, Double, Double) -> Double
getDistance (_,x1,y1) (_,x2,y2) = sqrt $ (x1-x2)^2 + (y1-y2)^2
getTheClusterID :: (Int, Double, Double) -> Int
getTheClusterID (id, _, _) = id
idxy = [(id, x, y)]
createCluster id cs = [(id, minX, minY),(id+1, maxX, minY), (id+2, minX, maxY), (id+3, maxX, maxY)]
where minX = minimum $ map (\(_,x,_,_) -> x) cs
maxX = maximum $ map (\(_,x,_,_) -> x) cs
minY = minimum $ map (\(_,_,y,_) -> y) cs
maxY = maximum $ map (\(_,_,y,_) -> y) cs
idCluster = [1]
cluster = createCluster (last idCluster) idxy
clusterThis (id,a,b) = case (a,b) of
j | getDistance (a,b) (cluster!!0) < getDistance (a,b) (cluster!!1) &&
-> (getTheClusterID (cluster!!0), a, b)
j | getDistance (a,b) (cluster!!1) < getDistance (a,b) (cluster!!0) &&
-> (getTheClusterID (cluster!!1), a, b)
_ -> (getTheClusterID (cluster!!0), a, b)
groupAll = map clusterThis idxy
I am moving from imperative to functional. Sorry if my way of thinking is still in imperative way. Still learning.
EDIT: To clarify, this is the original data looks like.

The basic principle to follow in writing such an algorithm is to write small, compositional programs; each program is then easy to reason about and test in isolation, and the final program can be written in terms of the smaller ones.
The algorithm can be summarized as follows:
The presence of a 'repeat the process' step indicates this to be a divide and conquer problem.
I see no need for an ID for each point, so I've dispensed with this.
To begin, define datatypes for each type of data you will be working with:
import Data.List (partition)
data Point = Point { ptX :: Double, ptY :: Double }
data Cluster = Cluster { clusterPts :: [Point] }
This may seem silly for such simple data, but it can potentially save you quite a bit of confusion during debugging. Also note the import of a function we will be using later.
The 1st step:
minMaxPoints :: [Point] -> (Point, Point)
minMaxPoints ps =
(Point minX minY
,Point maxX maxY)
where minX = minimum $ map ptX ps
maxX = maximum $ map ptX ps
minY = minimum $ map ptY ps
maxY = maximum $ map ptY ps
This is essentially the same as your createCluster function.
The 2nd step:
pointDistance :: Point -> Point -> Double
pointDistance (Point x1 y1) (Point x2 y2) = sqrt $ (x1-x2)^2 + (y1-y2)^2
cluster1 :: [Point] -> [Cluster]
cluster1 ps =
let (mn, mx) = minMaxPoints ps
(psmn, psmx) = partition (\p -> pointDistance mn p < pointDistance mx p) ps
in [ Cluster psmn, Cluster psmx ]
This function should clear - it is a direct translation of the above statement of this step into code. The partition function takes a predicate and a list and produces two lists, the first containing all elements for which the predicate is true, and the second all elements for which it is false. pointDistance is essentially the same as your getDistance function.
The 3rd step:
cluster :: [Point] -> [Cluster]
cluster ps =
cluster1 ps >>= \cl@(Cluster c) ->
if length c > 5
then cluster c
else [cl]
This also implements the statement above very directly. Perhaps the only confusing part is the use of >>=, which (here) has type [a] -> (a -> [b]) -> [b]; it simply applies the given function to each element of the given list, and concatenates the result (equivalently, it is written flip concatMap).
Finally your test case (which I hope I've translated correctly from pictures to Haskell data):
testPts :: [Point]
testPts = map (uncurry Point)
[ (0,0), (1,0), (2,1), (0,2)
, (5,2), (5,4), (4,3), (4,4)
, (8,2), (9,3), (10,2)
, (11,4), (12,3), (13,3), (13,5) ]
main = mapM_ (print . map (\p -> (ptX p, ptY p)) . clusterPts) $ cluster testPts
Running this program produces
[(0.0,0.0),(0.0,2.0),(2.0,1.0),(1.0,0.0)]
[(4.0,4.0),(5.0,2.0),(5.0,4.0),(4.0,3.0)]
[(10.0,2.0),(9.0,3.0),(8.0,2.0)]
[(13.0,3.0),(12.0,3.0),(11.0,4.0),(13.0,5.0)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With