Given a list of points in a two dimensional space, you want to perform a function in Haskell to find the distance between the two closest points. example: Input: project [(1,5), (3,4), (2,8), (-1,2), (-8.6), (7.0), (1.5), (5.5), (4.8), (7.4)] Output: 2.0
Assume that the distance between the two farthest points in the list is at most 10000.
Here´s my code:
import Data.List
import System.Random
sort_ :: Ord a => [a] -> [a]
sort_ [] = []
sort_ [x] = [x]
sort_ xs = merge (sort_ left) (sort_ right)
where
(left, right) = splitAt (length xs `div` 2) xs
merge [] xs = xs
merge xs [] = xs
merge (x:xs) (y:ys)=
if x <= y then
x : merge xs (y:ys)
else y : merge (x:xs) ys
project :: [(Float,Float)] -> Float
project [] = 0
project (x:xs)=
if null (xs) then
error "The list have only 1 point"
else head(sort_(dstList(x:xs)))
distance :: (Float,Float)->(Float,Float) -> Float
distance (x1,y1) (x2,y2) = sqrt((x1 - x2)^2 + (y1 - y2)^2)
dstList :: [(Float,Float)] -> [Float]
dstList (x:xs)=
if length xs == 1 then
(dstBetween x xs):[]
else (dstBetween x xs):(dstList xs)
dstBetween :: (Float,Float) -> [(Float,Float)] -> Float
dstBetween pnt (x:xs)=
if null (xs) then
distance pnt x
else minimum ((distance pnt ):((dstBetween pnt xs)):[])
{-
Calling generator to create a file created at random points
-}
generator = do
putStrLn "Enter File Name"
file <- getLine
g <- newStdGen
let pts = take 1000 . unfoldr (Just . (\([a,b],c)->((a,b),c)) . splitAt 2)
$ randomRs(-1,1) g :: [(Float,Float)]
writeFile file . show $ pts
{-
Call the main to read a file and pass it to the function of project
The function of the project should keep the name 'project' as described
in the statement
-}
main= do
putStrLn "Enter filename to read"
name <- getLine
file <- readFile name
putStrLn . show . project $ readA file
readA::String->[(Float,Float)]
readA = read
I can perform a run of the program as in the example or using the generator as follows:
in haskell interpreter must type "generator", the program will ask for a file name containing a thousand points here. and after the file is generated in the Haskell interpreter must write main, and request a file name, which is the name of the file you create with "generator".
The problem is that for 1000 points randomly generated my program takes a long time, about 3 minutes on a computer with dual core processor. What am I doing wrong? How I can optimize my code to work faster?
You are using a quadratic algorithm:
project [] = error "Empty list of points"
project [_] = error "Single point is given"
project ps = go 10000 ps
where
go a [_] = a
go a (p:ps) = let a2 = min a $ minimum [distance p q | q<-ps]
in a2 `seq` go a2 ps
You should use a better algorithm. Search computational-geometry tag on SO for a better algorithm.
See also http://en.wikipedia.org/wiki/Closest_pair_of_points_problem .
@maxtaldykin proposes a nice, simple and effective change to the algorithm, which should make a real difference for random data -- pre-sort the points by X coordinate, and never try points more than d
units away from the current point, in X coordinate (where d
is the currently known minimal distance):
import Data.Ord (comparing)
import Data.List (sortBy)
project2 ps@(_:_:_) = go 10000 p1 t
where
(p1:t) = sortBy (comparing fst) ps
go d _ [] = d
go d p1@(x1,_) t = g2 d t
where
g2 d [] = go d (head t) (tail t)
g2 d (p2@(x2,_):r)
| x2-x1 >= d = go d (head t) (tail t)
| d2 >= d = g2 d r
| otherwise = g2 d2 r -- change it "mid-flight"
where
d2 = distance p1 p2
On random data, g2
will work in O(1)
time so that go
will take O(n)
and the whole thing will be bounded by a sort, ~ n log n
.
Empirical orders of growth show ~ n^2.1
for the first code (on 1k/2k range) and ~n^1.1
for the second, on 10k/20k range, testing it quick'n'dirty compiled-loaded into GHCi (with second code running 50 times faster than first for 2,000 points, and 80 times faster for 3,000 points).
It's possible to slightly modify your bruteforce search to get better performance on random data.
Main idea is to sort points by x coordinate and, while comparing distances in loop, consider only points that have horizontal distance not grater than current minimum distance.
This could be order of magnitude faster but in the worst case it is still O(n^2).
Actually, on 2000 points it is 50 times faster on my machine.
project points = loop1 10000 byX
where
-- sort points by x coordinate
-- (you need import Data.Ord to use `comparing`)
byX = sortBy (comparing fst) points
-- loop through all points from left to right
-- threading `d` through iterations as a minimum distance so far
loop1 d = foldl' loop2 d . tails
-- `tail` drops leftmost points one by one so `x` is moving from left to right
-- and `xs` contains all points to the right of `x`
loop2 d [] = d
loop2 d (x:xs) = let
-- we take only those points of `xs` whose horizontal distance
-- is not greater than current minimum distance
xs' = takeWhile ((<=d) . distanceX x) xs
distanceX (a,_) (b,_) = b - a
-- then just get minimum distance from `x` to those `xs'`
in minimum $ d : map (distance x) xs'
Btw, please don't use so many parentheses. Haskell does not require to enclose function arguments.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With