Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why `Vector.length (Vector.replicate n 0)" is not fused?

The following code unexpectedly (at least for me) produces an intermediate vector:

import qualified Data.Vector as Vector

main :: IO ()
main =
  print (test n)

n :: Int
n = 1000000

test :: Int -> Int
test n = Vector.length (Vector.replicate n (0 :: Int))

The relevant part of Core is here (note the newArray# 1000000 call):

Main.main4
  :: forall s_a38t.
     GHC.Prim.State# s_a38t
     -> (# GHC.Prim.State# s_a38t, Vector.Vector Int #)
[GblId,
 Arity=1,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True, Guidance=IF_ARGS [0] 399 30}]
Main.main4 =
  \ (@ s_a38t) (s1_a38u [OS=OneShot] :: GHC.Prim.State# s_a38t) ->
    case GHC.Prim.newArray#
           @ Int
           @ (Control.Monad.Primitive.PrimState (GHC.ST.ST s_a38t))
           1000000
           (Data.Vector.Mutable.uninitialised @ Int)
           (s1_a38u
            `cast` ((GHC.Prim.State#
                       (Sym (Control.Monad.Primitive.TFCo:R:PrimStateST[0] <s_a38t>_N)))_R
                    :: GHC.Prim.State# s_a38t
                       ~R# GHC.Prim.State#
                             (Control.Monad.Primitive.PrimState (GHC.ST.ST s_a38t))))
    of _ [Occ=Dead] { (# ipv_a5RG, ipv1_a5RH #) ->
    letrec {
      $wa_s609 [InlPrag=[0], Occ=LoopBreaker]
        :: GHC.Types.SPEC
           -> GHC.Prim.Int#
           -> Bool
           -> GHC.Prim.State# s_a38t
           -> (# GHC.Prim.State# s_a38t, Int #)
      [LclId, Arity=4, Str=DmdType <S,1*U><L,U><S,1*U><L,U>]
      $wa_s609 =
...

At the same time if I replace length with sum, fusion occurs correctly:

test n = Vector.sum (Vector.replicate n (0 :: Int))

Core:

Rec {
Main.main_$s$wfoldlM'_loop [Occ=LoopBreaker]
  :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int#
[GblId, Arity=2, Caf=NoCafRefs, Str=DmdType <L,U><L,U>]
Main.main_$s$wfoldlM'_loop =
  \ (sc_s6bx :: GHC.Prim.Int#) (sc1_s6by :: GHC.Prim.Int#) ->
    case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# sc1_s6by 0)
    of _ [Occ=Dead] {
      False ->
        Main.main_$s$wfoldlM'_loop sc_s6bx (GHC.Prim.-# sc1_s6by 1);
      True -> sc_s6bx
    }
end Rec }

Main.main2 :: String
[GblId,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
         WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 100 30}]
Main.main2 =
  case Main.main_$s$wfoldlM'_loop 0 1000000 of ww_s67W { __DEFAULT ->
  case GHC.Show.$wshowSignedInt 0 ww_s67W (GHC.Types.[] @ Char)
  of _ [Occ=Dead] { (# ww5_a5Vq, ww6_a5Vr #) ->
  GHC.Types.: @ Char ww5_a5Vq ww6_a5Vr
  }
  }

Also, if I rewrite the original function in terms of monadic stream combinators, the intermediate vector is not allocated also:

import qualified Data.Vector.Fusion.Stream.Monadic as Stream
import Data.Functor.Identity

test n = runIdentity $ Stream.length (Stream.replicate n (0 :: Int))

Core:

Rec {
Main.main_$s$wfoldlM'_loop [Occ=LoopBreaker]
  :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int#
[GblId, Arity=2, Caf=NoCafRefs, Str=DmdType <L,U><L,U>]
Main.main_$s$wfoldlM'_loop =
  \ (sc_s5lE :: GHC.Prim.Int#) (sc1_s5lF :: GHC.Prim.Int#) ->
    case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# sc1_s5lF 0)
    of _ [Occ=Dead] {
      False ->
        Main.main_$s$wfoldlM'_loop
          (GHC.Prim.+# sc_s5lE 1) (GHC.Prim.-# sc1_s5lF 1);
      True -> sc_s5lE
    }
end Rec }

Main.main2 :: String
[GblId,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
         WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 100 30}]
Main.main2 =
  case Main.main_$s$wfoldlM'_loop 0 1000000 of ww_s5ke { __DEFAULT ->
  case GHC.Show.$wshowSignedInt 0 ww_s5ke (GHC.Types.[] @ Char)
  of _ [Occ=Dead] { (# ww5_a5gi, ww6_a5gj #) ->
  GHC.Types.: @ Char ww5_a5gi ww6_a5gj
  }
  }

Why Vector.length breaks fusion?

I'm using ghc-7.10.3 and vector-0.11.0.0.

ADDED: Here is an issue: https://github.com/haskell/vector/issues/111

like image 384
Yuras Avatar asked Mar 18 '16 23:03

Yuras


People also ask

How are lentiviruses made replication incompetent?

Vectors for Lentiviral Production All wild-type virulence and replication genes are deleted. All Sigma-Aldrich® lentiviral transfer vectors contain a modified, self-inactivating 3' long terminal repeat (SIN/LTR) which renders the resulting lentiviral particles replication incompetent.

What are the three most important features for a cloning vector?

Features of Cloning VectorsIt should have a selectable marker with an antibiotic resistance gene that facilitates screening of the recombinant organism. It should be small in size so that it can easily integrate into the host cell. It should be capable of inserting a large segment of DNA.

How do expression vectors differ from vectors used for replicating DNA?

A cloning vector is used to acquire multiple copies of the foreign DNA fragment (gene of interest). An expression vector is utilised to acquire the gene product, which might be RNA or protein, of the DNA (gene of interest).

Why do shuttle vectors have two origins of replication?

Because shuttle vectors transfer genes to expression hosts, they need to have a promoter to initiate transcription and a ribosome binding site to initiate translation.


2 Answers

I used sum and length from Data.Vector.Generic rather than Data.Vector since the latter are just defined as the former.

Here's the code for length (from Data.Vector.Generic) ...

-- | /O(1)/ Yield the length of the vector.
length :: Vector v a => v a -> Int
{-# INLINE length #-}
length = Bundle.length . stream

Hmm.. so let's look at "sum"

-- | /O(n)/ Compute the sum of the elements
sum :: (Vector v a, Num a) => v a -> a
{-# INLINE sum #-}
sum = Bundle.foldl' (+) 0 . stream

But if I run ghc -ddump-inlinings -ddump-rule-firings -O2 with sum I see

Rule fired: SPEC Data.Vector.$fVectorVectora [GHC.Types.Int]
Inlining done: System.IO.print
Inlining done: System.IO.print1
Inlining done: Data.Vector.Generic.sum
Rule fired: Class op +
Rule fired: Class op fromInteger
Inlining done: GHC.Num.$fNumInt_$cfromInteger
Rule fired: integerToInt
Inlining done: Data.Vector.Fusion.Util.unId
Inlining done: Data.Vector.Fusion.Util.unId1
Inlining done: Data.Vector.replicate
Inlining done: Data.Vector.Generic.replicate

And if I run it with length I see:

Rule fired: SPEC Data.Vector.$fVectorVectora [GHC.Types.Int]
Inlining done: System.IO.print
Inlining done: System.IO.print1
Inlining done: Data.Vector.replicate
Inlining done: Data.Vector.Generic.replicate
Rule fired: SPEC Data.Vector.$fVectorVectora [GHC.Types.Int]

So sum gets inlined and length doesn't, and I don't understand why. And even turning up the unfolding threshold to ridiculous amounts doesn't change that.

That said, if I manually replace Vector.length with Bundle.length . Vector.stream, the stream/unstream rule does fire, as in the sum case, and a very tidy core is generated with no array allocations.

like image 94
sclv Avatar answered Sep 22 '22 00:09

sclv


This is an extension of sclv's answer.

I noticed the behavior in the question occurs with vector-0.11.0.0, but not the other version I happened to have installed, vector-0.10.12.2. Inspecting the Data/Vector/Generic.hi files from those two versions with ghc --show-iface, I found that in version 0.11.0.0 only, length (but not sum) is marked as a "loop-breaker". This means length is part of a mutually recursive group of definitions and GHC has chosen this function to not inline in order to avoid the possibility of an infinite expansion.

I assume what happened is that the changes in 0.11.0.0 made length part of a cycle of definitions, probably unintentionally, where it wasn't before, but I didn't try to verify that since it would require actually reading vector source code.

like image 29
Reid Barton Avatar answered Sep 21 '22 00:09

Reid Barton