Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Project Euler 23: insight on this stackoverflow-ing program needed

Hi haskell fellows. I'm currently working on the 23rd problem of Project Euler. Where I'm at atm is that my code seems right to me - not in the "good algorithm" meaning, but in the "should work" meaning - but produces a Stack memory overflow.

I do know that my algorithm isn't perfect (in particular I could certainly avoid computing such a big intermediate result at each recursion step in my worker function).

Though, being in the process of learning Haskell, I'd like to understand why this code fails so miserably, in order to avoid this kind of mistakes next time.

Any insight on why this program is wrong will be appreciated.

import qualified Data.List as Set ((\\))

main = print $ sum $ worker abundants [1..28123]

-- Limited list of abundant numbers
abundants :: [Int]
abundants = filter (\x -> (sum (divisors x)) - x > x) [1..28123]

-- Given a positive number, returns its divisors unordered.
divisors :: Int -> [Int]
divisors x  | x > 0     =   [1..squareRoot x] >>=
                            (\y ->  if      mod x y == 0
                                    then    let d = div x y in
                                            if y == d
                                            then [y]
                                            else [y, d]
                                    else    [])
            | otherwise = []


worker :: [Int] -> [Int] -> [Int]
worker (a:[]) prev = prev Set.\\ [a + a]
worker (a:as) prev = worker as (prev Set.\\ (map ((+) a) (a:as)))


-- http://www.haskell.org/haskellwiki/Generic_number_type#squareRoot
(^!) :: Num a => a -> Int -> a
(^!) x n = x^n

squareRoot :: Int -> Int
squareRoot 0 = 0
squareRoot 1 = 1
squareRoot n =
   let twopows = iterate (^!2) 2
       (lowerRoot, lowerN) =
          last $ takeWhile ((n>=) . snd) $ zip (1:twopows) twopows
       newtonStep x = div (x + div n x) 2
       iters = iterate newtonStep (squareRoot (div n lowerN) * lowerRoot)
       isRoot r  =  r^!2 <= n && n < (r+1)^!2
   in  head $ dropWhile (not . isRoot) iters

Edit: the exact error is Stack space overflow: current size 8388608 bytes.. Increasing the stack memory limit through +RTS -K... doesn't solve the problem.

Edit2: about the sqrt thing, I just copy pasted it from the link in comments. To avoid having to cast Integer to Doubles and face the rounding problems etc...

like image 422
m09 Avatar asked Apr 01 '12 17:04

m09


3 Answers

In the future, it's polite to attempt a bit of minimalization on your own. For example, with a bit of playing, I was able to discover that the following program also stack-overflows (with an 8M stack):

main = print (worker [1..1000] [1..1000])

...which really nails down just what function is screwing you over. Let's take a look at worker:

worker (a:[]) prev = prev Set.\\ [a + a]
worker (a:as) prev = worker as (prev Set.\\ (map ((+) a) (a:as)))

Even on my first read, this function was red-flagged in my mind, because it's tail-recursive. Tail recursion in Haskell is generally not such a great idea as it is in other languages; guarded recursion (where you produce at least one constructor before recursing, or recurse some small number of times before producing a constructor) is generally better for lazy evaluation. And in fact, here, what's happening is that each recursive call to worker is building a deeper- and deeper-ly nested thunk in the prev argument. When the time comes to finally return prev, we have to go very deeply into a long chain of Set.\\ calls to work out just what it was we finally have.

This problem is obfuscated slightly by the fact that the obvious strictness annotation doesn't help. Let's massage worker until it works. The first observation is that the first clause is completely subsumed by the second one. This is stylistic; it shouldn't affect the behavior (except on empty lists).

worker []     prev = prev
worker (a:as) prev = worker as (prev Set.\\ map (a+) (a:as))

Now, the obvious strictness annotation:

worker []     prev = prev
worker (a:as) prev = prev `seq` worker as (prev Set.\\ map (a+) (a:as))

I was surprised to discover that this still stack overflows! The sneaky thing is that seq on lists only evaluates far enough to learn whether the list matches either [] or _:_. The following does not stack overflow:

import Control.DeepSeq

worker []     prev = prev
worker (a:as) prev = prev `deepseq` worker as (prev Set.\\ map (a+) (a:as))

I didn't plug this final version back into the original code, but it at least works with the minimized main above. By the way, you might like the following implementation idea, which also stack overflows:

import Control.Monad

worker as bs = bs Set.\\ liftM2 (+) as as

but which can be fixed by using Data.Set instead of Data.List, and no strictness annotations:

import Control.Monad
import Data.Set as Set

worker as bs = toList (fromList bs Set.\\ fromList (liftM2 (+) as as))
like image 102
Daniel Wagner Avatar answered Oct 08 '22 19:10

Daniel Wagner


Ok, I loaded it up and gave it a shot. Daniel Wagner's advice is pretty good, probably better than mine. The problem is indeed with the worker function, but I was going to suggest using Data.MemoCombinators to memoize your function instead.

Also, your divisors algorithm is kind of silly. There's a much better way to do that. It's kind of mathy and would require a lot of TeX, so here's a link to a math.stackexchange page about how to do that. The one I was talking about, was the accepted answer, though someone else gives a recursive solution that I think would run faster. (It doesn't require prime factorization.)

https://math.stackexchange.com/questions/22721/is-there-a-formula-to-calculate-the-sum-of-all-proper-divisors-of-a-number

like image 39
afkbowflexin Avatar answered Oct 08 '22 19:10

afkbowflexin


As Daniel Wagner correctly said, the problem is that

worker (a:as) prev = worker as (prev Set.\\ (map ((+) a) (a:as)))

builds a badly nested thunk. You can avoid that and get somewhat better performance than with deepseq by exploiting the fact that both arguments to worker are sorted in this application. Thus you can get incremental output by noting that at any step everything in prev smaller than 2*a cannot be the sum of two abundant numbers, so

worker (a:as) prev = small ++ worker as (large Set.\\ map (+ a) (a:as))
  where
    (small,large) = span (< a+a) prev

does better. However, it's still bad because (\\) cannot use the sortedness of the two lists. If you replace it with

minus xxs@(x:xs) yys@(y:ys)
    = case compare x y of
        LT -> x : minus xs yys
        EQ -> minus xs ys
        GT -> minus xxs ys
minus xs _ = xs             -- originally forgot the case for one empty list

(or use the data-ordlist package's version), calculating the set-difference is O(length) instead of O(length^2).

like image 42
Daniel Fischer Avatar answered Oct 08 '22 21:10

Daniel Fischer