Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Special runtime representation of [] type?

Tags:

haskell

ghc

Consider the simple definition of a length-indexed vector:

data Nat = Z | S Nat 

infixr 5 :> 
data Vec (n :: Nat) a where 
  V0 :: Vec 'Z a 
  (:>) :: a -> Vec n a -> Vec ('S n) a 

Naturally I would at some point need the following function:

vec2list :: Vec n a -> [a]  

However, this function is really just a fancy identity. I believe that the runtime representations of these two types are the same, so

vec2list :: Vec n a -> [a]  
vec2list = unsafeCoerce 

should work. Alas, it does not:

>vec2list ('a' :> 'b' :> 'c' :> V0)
""

Every input simply returns the empty list. So I assume my understand is lacking. To test it, I define the following:

data List a = Nil | Cons a (List a) deriving (Show) 

vec2list' :: Vec n a -> List a 
vec2list' = unsafeCoerce 

test1 = vec2list' ('a' :> 'b' :> 'c' :> V0)

data SomeVec a = forall n . SomeVec (Vec n a) 

list'2vec :: List a -> SomeVec a 
list'2vec x = SomeVec (unsafeCoerce x) 

Surprisingly this works! It certainly isn't an issue with the GADT then (my initial thought).

I think that the List type is really identical at runtime to []. I try to test this too:

list2list :: [a] -> List a 
list2list = unsafeCoerce 

test2 = list2list "abc" 

and it works! Based on this fact, I have to conclude that [a] and List a must have the same runtime representation. And yet, the following

list2list' :: List a -> [a] 
list2list' = unsafeCoerce 

test3 = list2list' (Cons 'a' (Cons 'b' (Cons 'c' Nil)))

does not work. list2list' again always returns the empty list. I believe that "having identical runtime representations" must be a symmetric relation, so this doesn't seem to make sense.

I began to think maybe there's something funny with "primitive" types - but I always believed that [] is only special syntactically, not semantically. It seems that's the case:

data Pair a b = Pair a b deriving (Show, Eq, Ord)  

tup2pair :: (a,b) -> Pair a b 
tup2pair = unsafeCoerce 

pair2tup :: Pair a b -> (a,b) 
pair2tup = unsafeCoerce 

The first function works and the second does not - same as the in the case of List and []. Although in this case, pair2tup segfaults as opposed to always returning the empty list.

It seems to be consistently asymmetric with respect to types which use "built-in" syntax. Back to the Vec example, the following

list2vec :: [a] -> SomeVec a 
list2vec x = SomeVec (unsafeCoerce x) 

works just fine as well! The GADT really isn't special.

The question is: how do the runtime representations of types which use "built-in" syntax differ from those that do not?

Alternatively, how does one write a zero-cost coercion from Vec n a to [a]? This doesn't answer the question but solves the problem.

Testing was done with GHC 7.10.3.


As noted by a commenter, this behaviour is only present when interpreting. When compiled, all functions work as expected. The question still applies, just to runtime representation when interpreting.

like image 813
user2407038 Avatar asked Mar 17 '16 07:03

user2407038


1 Answers

Now to answer your main question, this thread appears to have the answer: start ghci with -fobject-code:

$ ghci /tmp/vec.hs
GHCi, version 7.10.3: http://www.haskell.org/ghc/  :? for help
[1 of 1] Compiling Main             ( /tmp/vec.hs, interpreted )
Ok, modules loaded: Main.
*Main> print $ vec2list ('a' :> 'b' :> 'c' :> V0)
""

With -fobject-code:

$ ghci -fobject-code /tmp/vec.hs
GHCi, version 7.10.3: http://www.haskell.org/ghc/  :? for help
[1 of 1] Compiling Main             ( /tmp/vec.hs, /tmp/vec.o )
Ok, modules loaded: Main.
Prelude Main> print $ vec2list ('a' :> 'b' :> 'c' :> V0)
"abc"

The modules that contain [] and (,) are all compiled, which causes their runtime representation to be different from isomorphic datatypes in interpreted modules. According to Simon Marlow on the thread I linked, interpreted modules add annotations for the debugger. I think this also explains why tup2pair works and pair2tup doesn't: missing annotations isn't a problem for interpreted modules, but the compiled modules choke on the extra annotations.

-fobject-code has some downsides: longer compilation time, only brings exported functions in scope, but it has the additional advantage that running the code is much faster.

like image 50
xnyhps Avatar answered Oct 07 '22 11:10

xnyhps