Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is replicateM (length xs) m way more efficient than sequenceA (fmap (const m) xs)?

My two submissions for a programming problem differ in just one expression (where anchors is a nonempty list and (getIntegrals n) is a state monad):

Submission 1. replicateM (length anchors - 1) (getIntegrals n)

Submission 2. sequenceA $ const (getIntegrals n) <$> tail anchors

The two expressions' equivalence should be easy to see at compile time itself, I guess. And yet, comparatively the sequenceA one is slower, and more importantly, takes up >10x memory:

Code Time Memory
replicateM one 732 ms 22200 KB
sequenceA one 1435 ms 262100 KB

(with "Memory limit exceeded on test 4" error for the second entry, so it might be even worse).

Why is it so?

It is becoming quite hard to predict which optimizations are automatic and which are not!

EDIT: As suggested, pasting Submission 1 code below. In this interactive problem, the 'server' has a hidden tree of size n. Our code's job is to find out that tree, with minimal number of queries of the form ? k. Loosely speaking, the server's response to ? k is the row corresponding to node k in the adjacency distance matrix of the tree. Our choices of k are: initially 1, and then a bunch of nodes obtained from getAnchors.

{-# LANGUAGE Safe #-}
{-# OPTIONS_GHC -O2 #-}

import Data.Maybe
import qualified Data.ByteString.Lazy.Char8 as B8
import qualified Data.ByteString.Builder as Bu
import Data.Functor.Identity
import Control.Monad.Trans.State
import Control.Monad
import Control.Applicative
import Data.ByteString.Builder.Extra (flush) 
import System.IO

type St = StateT [B8.ByteString] Identity

solve :: St Bu.Builder
solve = do
  n <- getIntegral
  ds <- getIntegrals n  -- get the first row of adjacency matrix
  let
    anchors = getAnchors ds
    readFirst = if head anchors==1 then return ds else getIntegrals n
    readRest = replicateM (length anchors - 1) (getIntegrals n) -- get some other rows too
  adjss <- liftA2 (:) readFirst readRest  
  let
    adj1ss = [map snd $ filter ((1==).fst) (zip adjs [1..]) | adjs <- adjss]
    s0 = Bu.string7
    snl = Bu.string7 "\n" <> flush
    i0 = Bu.intDec
    printEdge src dst = i0 src <> s0 " " <> i0 dst <> snl
    printAdj (src,dsts) = mconcat [printEdge src dst | dst<-dsts]
    printAdjs = mconcat $ printAdj <$> zip anchors adj1ss
    ask k = s0 "? " <> i0 k <> snl
    askRest = mconcat $ ask <$> (dropWhile (==1) anchors)
  return $ ask 1 <> askRest <> s0 "!" <> snl <> printAdjs

getAnchors :: [Int]->[Int]
getAnchors xs = reverse $ go (zip xs [1..]) [] [] where
  go [] odds evens = if length odds < length evens then odds else evens
  go ((k,i):rest) odds evens
    | even k = go rest odds (i: evens)
    | odd k = go rest (i: odds) evens
 
getByteString :: St B8.ByteString
getByteString = state getNext where
  getNext [] =  (B8.take 0 (B8.pack "."),[])
  getNext (w:ws) =  (w,ws)
 
getIntegral :: Num t => St t
getIntegral  = convertToNum <$> getByteString where
  convertToNum x =  fromIntegral $ fromMaybe 0 $ liftA fst $ B8.readInteger x
 
getIntegrals :: Num t => Int -> St [t]
getIntegrals n = replicateM n getIntegral

main :: IO ()
main = do
  hSetBuffering stdout NoBuffering
  bytestrings <- B8.words <$> B8.getContents
  B8.putStr $ Bu.toLazyByteString $ evalState solve bytestrings
like image 700
cobra Avatar asked Nov 08 '21 13:11

cobra


People also ask

Is autocorrelation of M-sequence similar to that of random sequence?

We observe that that autocorrelation of m-sequence carries some similarities with that of a random sequence. If the length of the m-sequence is increased, the out-of-peak correlation reduces further and thereby the peaks become more distinct.

What happens when the length of the M-sequence is increased?

If the length of the m-sequence is increased, the out-of-peak correlation reduces further and thereby the peaks become more distinct. This property makes the m-sequences suitable for synchronization and in the detection of information in single-user Direct Sequence Spread Spectrum systems.

How are maximum length sequences generated in linear programming?

Maximum length sequences are generated using linear feedback shift registers (LFSR) structures that implement linear recursion. There are two types of LFSR structures available for implementation – 1) Galois LFSR and 2) Fibonacci LFSR.

What are maximum-length sequences (m-sequences)?

Maximum-length sequences (also called as m-sequences or pseudo random (PN) sequences) are constructed based on Galois field theory which is an extensive topic in itself. A detailed treatment on the subject of Galois field theory can be found in references and. This article is part of the book


1 Answers

The problem here is related to inlining. I do not understand it completly, but here is what I understand.

Inlining

First we find that copy&pasting the definition of replicateM into the Submission 1 yields the same bad performance as Submission 2 (submission). However if we replace the INLINABLE pragma of replicateM with a NOINLINE pragma things work again (submission).

The INLINABLE pragma on replicateM is different from an INLINE pragma, the latter leading to more inlining than the former. Specifically here if we define replicateM in the same file Haskells heuristic for inlining decides to inline, but with replicateM from base it decides against inlining in this case even in the presence of the INLINABLE pragma.

sequenceA and traverse on the other hand both have INLINE pragmas leading to inlining. Taking a hint from the above experiment we can define a non-inlinable sequenceA and indead this makes Solution 2 work (submission).

{-# NOINLINE sequenceA' #-}
sequenceA' :: [St x] -> St [x]
sequenceA' = sequenceA

What is going wrong?

The following is some pretty severe speculation on my part.

But how does inlining cause problems? Well let's look at the difference between the following two core dumps

With inlining:
With inlining

Without inlining:
Without inlining

Here we're looking both times at what corresponds to, in the first instance the inlined part and the second instance the actual call to replicateM.

readRest = replicateM (length anchors - 1) (getIntegrals n)

Now the interesting bit is that in the inlined code the yellow highlighted lines are run in every loop of replicateM, while in the non-inlined part they are calculated once, outside the lambda abstraction which is passed to replicateM.

But what are they doing? There are multiple variables called ds in the core, but this one refers to this:

enter image description here

which in turn corresponds to

solve = do
  n <- getIntegral

So what I think is happening is that instead of running getIntegral once and saving the result, it's starting state is saved and it is rerun with this state for every pass of the loop. Indeed changing this line to the following (requires BangPatterns language extension) fixes all versions (submission).

solve = do
  !n <- getIntegral

I'm still not really sure, but this is my best guess.

Here are the two core dumps for reference: Inline, Noinline

This is crazy

Well yes, but I feel that the underlying problem here is your use of lazy IO and lazy State. Using the strict State transformer Haskell probably would have figured out to not keep old state around (I have no idea, just a guess), however we can not use strict State here, because of your reliance on lazy IO, i.e. getting all the input at the beginning using getContents and lazyly forcing it while making sure to provide output before forcing too much. Instead it would be a lot safer to explicitely read the input line by line. I.e. replace the StateT [ByteString] with IO or something more fancy like a Conduit or Pipe.

like image 128
Julia Path Avatar answered Oct 12 '22 13:10

Julia Path