Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extent of GHC's optimization

I am not very familiar with the degree that Haskell/GHC can optimize code. Below I have a pretty "brute-force" (in the declarative sense) implementation of the n queens problem. I know it can be written more efficiently, but thats not my question. Its that this got me thinking about the GHC optimizations capabilities and limits.

I have expressed it in what I consider a pretty straightforward declarative sense. Filter permutations of [1..n] that fulfill the predicate For all indices i,j s.t j<i, abs(vi - vj) != j-i I would hope this is the kind of thing that can be optimized, but it also kind of feels like asking a lot of compiler.

validQueens x = and [abs (x!!i - x!!j) /= j-i | i<-[0..length x - 2], j<-[i+1..length x - 1]] 

queens n = filter validQueens (permutations [1..n])

oneThru x = [1..x]    
pointlessQueens = filter validQueens . permutations . oneThru

main = do
          n <- getLine 
          print $ pointlessQueens $ (read :: String -> Int) n

This runs fairly slow and grows quickly. n=10 takes about a second and n=12 takes forever. Without optimization I can tell the growth is factorial (# of permutations) multiplied by quadratic (# of differences in the predicate to check). Is there any way this code can perform better thru intelligent compilation? I tried the basic ghc options such has -O2 and didn't notice a significant difference, but I don't know the finer points (just added the flagS)

My impression is that the function i call queens can not be optimized and must generate all permutations before filter. Does the point-free version have a better chance? On the one hand I feel like a smart function comprehension between a filter and a predicate might be able to knock off some obviously undesired elements before they are even fully generated, but on the other hand it kind of feels like a lot to ask.

Sorry if this seems rambling, i guess my question is

  1. Is the pointfree version of above function more capable of being optimized?
  2. What steps could I take at make/compile/link time to encourage optimization?
  3. Can you briefly describe some possible (and contrast with the impossible!) means of optimization for the above code? At what point in the process do these occur?
  4. Is there any particular part of ghc --make queensN -O2 -v output I should be paying attention to? Nothing stands out to me. Don't even see much difference in output due to optimization flags

I am not overly concerned with this code example, but I thought writing it got me thinking and it seems to me like a decent vehicle for discussing optimization.

PS - permutations is from Data.List and looks like this:

permutations            :: [a] -> [[a]]
permutations xs0        =  xs0 : perms xs0 []
  where
    perms []     _  = []
    perms (t:ts) is = foldr interleave (perms ts (t:is)) (permutations is)
      where interleave    xs     r = let (_,zs) = interleave' id xs r in zs
            interleave' _ []     r = (ts, r)
            interleave' f (y:ys) r = let (us,zs) = interleave' (f . (y:)) ys r
                                     in  (y:us, f (t:y:us) : zs)
like image 829
jon_darkstar Avatar asked Jun 24 '11 15:06

jon_darkstar


3 Answers

At a more general level regarding "what kind of optimizations can GHC do", it may help to break the idea of an "optimization" apart a little bit. There are conceptual distinctions that can be drawn between aspects of a program that can be optimized. For instance, consider:

  • The intrinsic logical structure of the algorithm: You can safely assume in almost every case that this will never be optimized. Outside of experimental research, you're not likely to find a compiler that will replace a bubble sort with a merge sort, or even an insertion sort, and extremely unlikely to find one that would replace a bogosort with something sensible.

  • Non-essential logical structure of the algorithm: For instance, in the expression g (f x) (f x), how many times will f x be computed? What about an expression like g (f x 2) (f x 5)? These aren't intrinsic to the algorithm, and different variations can be interchanged without impacting anything other than performance. The difficulties in performing optimization here are essentially recognizing when a substitution can in fact be done without changing the meaning, and predicting which version will have the best results. A lot of manual optimizations fall into this category, along with a great deal of GHC's cleverness.

    This is also the part that trips a lot of people up, because they see how clever GHC is and expect it to do even more. And because of the reasonable expectation that GHC should never make things worse, it's not uncommon to have potential optimizations that seem obvious (and are, to the programmer) that GHC can't apply because it's nontrivial to distinguish cases where the same transformation would significantly degrade performance. This is, for instance, why memoization and common subexpression elimination aren't always automatic.

    This is also the part where GHC has a huge advantage, because laziness and purity make a lot of things much easier, and is I suspect what leads to people making tongue-in-cheek remarks like "Optimizing compilers are a myth (except perhaps in Haskell).", but also to unrealistic optimism about what even GHC can do.

  • Low-level details: Things like memory layout and other aspects of the final code. These tend to be somewhat arcane and highly dependent on implementation details of the runtime, the OS, and the processor. Optimizations of this sort are essentially why we have compilers, and usually not something you need to worry about unless you're writing code that is very computationally demanding (or are writing a compiler yourself).

As far as your specific example here goes: GHC isn't going to significantly alter the intrinsic time complexity of your algorithm. It might be able to remove some constant factors. What it can't do is apply constant-factor improvements that it can't be sure are correct, particularly ones that technically change the meaning of the program in ways that you don't care about. Case in point here is @sclv's answer, which explains how your use of print creates unnecessary overhead; there's nothing GHC could do about that, and in fact the current form would possibly inhibit other optimizations.

like image 88
C. A. McCann Avatar answered Nov 18 '22 15:11

C. A. McCann


There's a conceptual problem here. Permutations is generating streaming permutations, and filter is streaming too. What's forcing everything prematurely is the "show" implicit in "print". Change your last line to:

mapM print $ pointlessQueens $ (read :: String -> Int) n

and you'll see that results are generated in a streaming fashion much more rapidly. That fixes, for large result sets, a potential space leak, and other than that just lets things be printed as computed rather than all at once at the end.

However, you shouldn't expect any order of magnitude improvements from ghc optimizations (there are a few, obvious ones that you do get, mostly having to do with strictness and folds, but its irritating to rely on them). What you'll get are constant factors, generally.

Edit: As luqui points out below, show is also streaming (or at least show of [Int] is), but the line buffering nonetheless makes it harder to see the genuine speed of computation...

like image 27
sclv Avatar answered Nov 18 '22 16:11

sclv


It should be noted, although you do express that it is not part of your question, that the big problem with your code is that you do not do any pruning.

In the case of your question, it feels foolish to talk about possible/impossible optimization, compiler flags, and how to best formulate it etc. when an improvement of the algorithm is staring us so blatantly in the face.

One of the first things that will be tried is the permutations starting with the first queen in position 1 and the second queen in position 2 ([1,2...]). This is of course not a solution and we will have to move one of the queens. However, in your implementation, all permutations involving this combination of the two first queens will be tested! The search should stop there and instantly move to the permutations involving [1,3,...].

Here is a version that does this sort of pruning:

import Data.List
import Control.Monad

main = getLine >>= mapM print . queens . read

queens :: Int -> [[Int]]
queens n = queens' [] n

queens' xs n 
 | length xs == n = return xs 
 | otherwise = do 
  x <- [1..n] \\ xs
  guard (validQueens (x:xs))
  queens' (x:xs) n

validQueens x = 
  and [abs (x!!i - x!!j) /= j-i | i<-[0..length x - 2], j<-[i+1..length x - 1]]
like image 31
HaskellElephant Avatar answered Nov 18 '22 16:11

HaskellElephant