Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to check for pointer equality in SMLNJ (for debugging)?

Suppose we have the following toy binary tree structure:

datatype Tree = Leaf | Branch of Tree * Tree
fun left(Branch(l,r))= l
fun right(Branch(l,r))= r

And suppose that we have some large and expensive to compute tree

val c: Tree= …
val d: Tree= Branch(c,c)

Can we verify in the SML/NJ interpreter that left(d) and right(d) indeed refer to the same place in memory?

(This question was borne out of working with lazy streams which may possibly contain cycles, and trying to debug whether the memoization is working correctly.)

like image 317
kinokijuf Avatar asked May 14 '20 11:05

kinokijuf


1 Answers

I think we can do this by casting both values to word using Unsafe.cast, which reinterprets the pointers as numbers that can be compared with =. Here is an is function that implements this idea:

infix 4 is (* = > < >= ... *)
fun op is(a: 'a, b: 'a) = (Unsafe.cast a: word) = Unsafe.cast b

Note that:

  • I needed to annotate (a, b) to make sure the type checker restricts the arguments to the same type, because the typesig of is would otherwise be 'a * 'b -> bool
  • I needed to annotate the first Unsafe.cast application to prevent SML/NJ from having to use polyEqual and thus avoid emitting Warning: calling polyEqual
  • Type inference can figure out the rest just fine

Here's an example that illustrates structural sharing in vectors:

local
  fun const a _ = a
  val v = Vector.tabulate(1000, const #[1,2,3])
  val v = Vector.update(v, 230, #[1,2,3]) (* same value, but new allocation *)
in
  val test1 = Vector.sub(v, 0) is Vector.sub(v, 999)
  val test2 = Vector.sub(v, 0) is Vector.sub(v, 230)
end

And it does work as expected. The repl answers with

(* val test1 = true : bool
   val test2 = false : bool *)

Now here's the tree example from your question:

local
  datatype Tree
    = Leaf
    | Branch of Tree * Tree

  fun left (Branch(l,r)) = l
  fun right (Branch(l,r)) = r

  val c = Branch(Leaf,Leaf) (* imagine it being something complex *)
  val d = Branch(c, c)
in
  val test3 = left d is right d
end

When we try it, it answers correctly:

(* val test3 = true : bool *)

I think this answers your question. Below this point I talk about the choice of word and a little about what might be happening internally when we convert to it


As far as I'm aware, SML/NJ does pointer tagging like many lisps, v8, OCaml, etc.. which means we want to cast specifically to a type which isn't heap-allocated.. because we want to be able to read the pointer value, not misinterpret heap objects.

I think word works fine for that purpose; it's immediate like an int, and unsigned unlike it.. so it should correspond to the memory address (don't hold me up on that).

There seems to be a bug* that prevents you from inspecting the word value directly in the repl, it may be the pointer tagging at play.
* at least the compiler reports it as that? as of v110.99

One workaround is to immediately convert the value to a different representation (perhaps being boxed is required?), like a string, or Word64.word

fun addrOf x = Word.toString (Unsafe.cast x)

Indeed, when we try to use our newly defined addrOf function to compare the address versus its stringified value we can observe the effects of pointer tagging

(* We'll need these definitions onwards, might as well have them here: *)

infix 5 >> <<
val op >> = Word.>>
val op << = Word.<<
val unsafeWord = Option.valOf o Word.fromString
local
  val x = SOME 31 (* dummy boxed value *)
  val addr = unsafeWord (addrOf x)
in
  val test4 = Unsafe.cast x = addr
  val test5 = Unsafe.cast x >> 0w1 = addr >> 0w1 (* get rid of lowest bit *)
end

(* val test4 = false : bool
   val test5 = true : bool *)

So then, if it's the case that the tag is just the lowest bit of a machine word in SML/NJ, like it is in many tagged pointer implementations, then the pointer should be, accurately, the casting value shifted right once, then left once again.

fun addrOf x = Unsafe.cast x >> 0w1 << 0w1

The reason we do this seemingly nop conversion (remember, all pointers are even) is because it properly tags the cast word value in the process.
If we shift left first then right, the tag itself would find its way to the value with the first operation as the coerced pointer turns into proper word.. that's why we shift right first instead.. Shifting left from that zero-fills the lower bit, so no info about the address is lost, but an immediate value tag is properly present internally.

local
  fun strAddrOf x = Word.toString (Unsafe.cast x)
  fun isEven x = Word.andb (x, 0w1) = 0w0

  val x = SOME 42
  val ogAddr = unsafeWord (strAddrOf x) (* a known-correct conversion: no shifting takes place *)
  val badAddr = Unsafe.cast x << 0w1 >> 0w1
  val goodAddr = Unsafe.cast x >> 0w1 << 0w1
in
  val test6 = ogAddr = badAddr
  val test7 = ogAddr = goodAddr
  val test8 = isEven ogAddr
end

(* val test6 = false : bool
   val test7 = true : bool
   val test8 = true : bool *)

This shifting in addrOf allows you to get the pointer value directly without intermediate conversions (and boxing) to string or word64. Of course, this solution breaks down with actual unboxed values so it's good to test for whether the object is boxed to begin with (Unsafe.boxed) in your addrOf definition, and return 0wx0 in case you were working with immediates.

Hope this works for your purposes. It certainly did for mine so far!

like image 123
Moth Avatar answered Nov 15 '22 09:11

Moth