My two submissions for a programming problem differ in just one expression (where anchors
is a nonempty list and (getIntegrals n)
is a state monad):
Submission 1. replicateM (length anchors - 1) (getIntegrals n)
Submission 2. sequenceA $ const (getIntegrals n) <$> tail anchors
The two expressions' equivalence should be easy to see at compile time itself, I guess. And yet, comparatively the sequenceA
one is slower, and more importantly, takes up >10x memory:
Code | Time | Memory |
---|---|---|
replicateM one | 732 ms | 22200 KB |
sequenceA one | 1435 ms | 262100 KB |
(with "Memory limit exceeded on test 4" error for the second entry, so it might be even worse).
Why is it so?
It is becoming quite hard to predict which optimizations are automatic and which are not!
EDIT: As suggested, pasting Submission 1 code below. In this interactive problem, the 'server' has a hidden tree of size n
. Our code's job is to find out that tree, with minimal number of queries of the form ? k
. Loosely speaking, the server's response to ? k
is the row corresponding to node k
in the adjacency distance matrix of the tree. Our choices of k
are: initially 1
, and then a bunch of nodes obtained from getAnchors
.
{-# LANGUAGE Safe #-}
{-# OPTIONS_GHC -O2 #-}
import Data.Maybe
import qualified Data.ByteString.Lazy.Char8 as B8
import qualified Data.ByteString.Builder as Bu
import Data.Functor.Identity
import Control.Monad.Trans.State
import Control.Monad
import Control.Applicative
import Data.ByteString.Builder.Extra (flush)
import System.IO
type St = StateT [B8.ByteString] Identity
solve :: St Bu.Builder
solve = do
n <- getIntegral
ds <- getIntegrals n -- get the first row of adjacency matrix
let
anchors = getAnchors ds
readFirst = if head anchors==1 then return ds else getIntegrals n
readRest = replicateM (length anchors - 1) (getIntegrals n) -- get some other rows too
adjss <- liftA2 (:) readFirst readRest
let
adj1ss = [map snd $ filter ((1==).fst) (zip adjs [1..]) | adjs <- adjss]
s0 = Bu.string7
snl = Bu.string7 "\n" <> flush
i0 = Bu.intDec
printEdge src dst = i0 src <> s0 " " <> i0 dst <> snl
printAdj (src,dsts) = mconcat [printEdge src dst | dst<-dsts]
printAdjs = mconcat $ printAdj <$> zip anchors adj1ss
ask k = s0 "? " <> i0 k <> snl
askRest = mconcat $ ask <$> (dropWhile (==1) anchors)
return $ ask 1 <> askRest <> s0 "!" <> snl <> printAdjs
getAnchors :: [Int]->[Int]
getAnchors xs = reverse $ go (zip xs [1..]) [] [] where
go [] odds evens = if length odds < length evens then odds else evens
go ((k,i):rest) odds evens
| even k = go rest odds (i: evens)
| odd k = go rest (i: odds) evens
getByteString :: St B8.ByteString
getByteString = state getNext where
getNext [] = (B8.take 0 (B8.pack "."),[])
getNext (w:ws) = (w,ws)
getIntegral :: Num t => St t
getIntegral = convertToNum <$> getByteString where
convertToNum x = fromIntegral $ fromMaybe 0 $ liftA fst $ B8.readInteger x
getIntegrals :: Num t => Int -> St [t]
getIntegrals n = replicateM n getIntegral
main :: IO ()
main = do
hSetBuffering stdout NoBuffering
bytestrings <- B8.words <$> B8.getContents
B8.putStr $ Bu.toLazyByteString $ evalState solve bytestrings
We observe that that autocorrelation of m-sequence carries some similarities with that of a random sequence. If the length of the m-sequence is increased, the out-of-peak correlation reduces further and thereby the peaks become more distinct.
If the length of the m-sequence is increased, the out-of-peak correlation reduces further and thereby the peaks become more distinct. This property makes the m-sequences suitable for synchronization and in the detection of information in single-user Direct Sequence Spread Spectrum systems.
Maximum length sequences are generated using linear feedback shift registers (LFSR) structures that implement linear recursion. There are two types of LFSR structures available for implementation – 1) Galois LFSR and 2) Fibonacci LFSR.
Maximum-length sequences (also called as m-sequences or pseudo random (PN) sequences) are constructed based on Galois field theory which is an extensive topic in itself. A detailed treatment on the subject of Galois field theory can be found in references and. This article is part of the book
The problem here is related to inlining. I do not understand it completly, but here is what I understand.
First we find that copy&pasting the definition of replicateM
into the Submission 1 yields the same bad performance as Submission 2 (submission). However if we replace the INLINABLE
pragma of replicateM
with a NOINLINE
pragma things work again (submission).
The INLINABLE
pragma on replicateM
is different from an INLINE
pragma, the latter leading to more inlining than the former. Specifically here if we define replicateM
in the same file Haskells heuristic for inlining decides to inline, but with replicateM
from base it decides against inlining in this case even in the presence of the INLINABLE
pragma.
sequenceA
and traverse
on the other hand both have INLINE
pragmas leading to inlining. Taking a hint from the above experiment we can define a non-inlinable sequenceA
and indead this makes Solution 2 work (submission).
{-# NOINLINE sequenceA' #-}
sequenceA' :: [St x] -> St [x]
sequenceA' = sequenceA
The following is some pretty severe speculation on my part.
But how does inlining cause problems? Well let's look at the difference between the following two core dumps
With inlining:
Without inlining:
Here we're looking both times at what corresponds to, in the first instance the inlined part and the second instance the actual call to replicateM.
readRest = replicateM (length anchors - 1) (getIntegrals n)
Now the interesting bit is that in the inlined code the yellow highlighted lines are run in every loop of replicateM
, while in the non-inlined part they are calculated once, outside the lambda abstraction which is passed to replicateM
.
But what are they doing? There are multiple variables called ds
in the core, but this one refers to this:
which in turn corresponds to
solve = do
n <- getIntegral
So what I think is happening is that instead of running getIntegral
once and saving the result, it's starting state is saved and it is rerun with this state for every pass of the loop. Indeed changing this line to the following (requires BangPatterns language extension) fixes all versions (submission).
solve = do
!n <- getIntegral
I'm still not really sure, but this is my best guess.
Here are the two core dumps for reference: Inline, Noinline
Well yes, but I feel that the underlying problem here is your use of lazy IO and lazy State. Using the strict State transformer Haskell probably would have figured out to not keep old state around (I have no idea, just a guess), however we can not use strict State here, because of your reliance on lazy IO, i.e. getting all the input at the beginning using getContents
and lazyly forcing it while making sure to provide output before forcing too much. Instead it would be a lot safer to explicitely read the input line by line. I.e. replace the StateT [ByteString]
with IO
or something more fancy like a Conduit
or Pipe
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With