Suppose we have the following toy binary tree structure:
datatype Tree = Leaf | Branch of Tree * Tree
fun left(Branch(l,r))= l
fun right(Branch(l,r))= r
And suppose that we have some large and expensive to compute tree
val c: Tree= …
val d: Tree= Branch(c,c)
Can we verify in the SML/NJ interpreter that left(d)
and right(d)
indeed refer to the same place in memory?
(This question was borne out of working with lazy streams which may possibly contain cycles, and trying to debug whether the memoization is working correctly.)
I think we can do this by casting both values to word
using Unsafe.cast
, which reinterprets the pointers as numbers that can be compared with =
. Here is an is
function that implements this idea:
infix 4 is (* = > < >= ... *)
fun op is(a: 'a, b: 'a) = (Unsafe.cast a: word) = Unsafe.cast b
Note that:
(a, b)
to make sure the type checker restricts the arguments to the same type, because the typesig of is
would otherwise be 'a * 'b -> bool
Unsafe.cast
application to prevent SML/NJ from having to use polyEqual
and thus avoid emitting Warning: calling polyEqual
Here's an example that illustrates structural sharing in vectors:
local
fun const a _ = a
val v = Vector.tabulate(1000, const #[1,2,3])
val v = Vector.update(v, 230, #[1,2,3]) (* same value, but new allocation *)
in
val test1 = Vector.sub(v, 0) is Vector.sub(v, 999)
val test2 = Vector.sub(v, 0) is Vector.sub(v, 230)
end
And it does work as expected. The repl answers with
(* val test1 = true : bool
val test2 = false : bool *)
Now here's the tree example from your question:
local
datatype Tree
= Leaf
| Branch of Tree * Tree
fun left (Branch(l,r)) = l
fun right (Branch(l,r)) = r
val c = Branch(Leaf,Leaf) (* imagine it being something complex *)
val d = Branch(c, c)
in
val test3 = left d is right d
end
When we try it, it answers correctly:
(* val test3 = true : bool *)
I think this answers your question. Below this point I talk about the choice of word
and a little about what might be happening internally when we convert to it
As far as I'm aware, SML/NJ does pointer tagging like many lisps, v8, OCaml, etc.. which means we want to cast specifically to a type which isn't heap-allocated.. because we want to be able to read the pointer value, not misinterpret heap objects.
I think word
works fine for that purpose; it's immediate like an int, and unsigned unlike it.. so it should correspond to the memory address (don't hold me up on that).
There seems to be a bug* that prevents you from inspecting the word value directly in the repl, it may be the pointer tagging at play.
* at least the compiler reports it as that? as of v110.99
One workaround is to immediately convert the value to a different representation (perhaps being boxed is required?), like a string, or Word64.word
fun addrOf x = Word.toString (Unsafe.cast x)
Indeed, when we try to use our newly defined addrOf
function to compare the address versus its stringified value we can observe the effects of pointer tagging
(* We'll need these definitions onwards, might as well have them here: *)
infix 5 >> <<
val op >> = Word.>>
val op << = Word.<<
val unsafeWord = Option.valOf o Word.fromString
local
val x = SOME 31 (* dummy boxed value *)
val addr = unsafeWord (addrOf x)
in
val test4 = Unsafe.cast x = addr
val test5 = Unsafe.cast x >> 0w1 = addr >> 0w1 (* get rid of lowest bit *)
end
(* val test4 = false : bool
val test5 = true : bool *)
So then, if it's the case that the tag is just the lowest bit of a machine word in SML/NJ, like it is in many tagged pointer implementations, then the pointer should be, accurately, the casting value shifted right once, then left once again.
fun addrOf x = Unsafe.cast x >> 0w1 << 0w1
The reason we do this seemingly nop
conversion (remember, all pointers are even) is because it properly tags the cast word value in the process.
If we shift left first then right, the tag itself would find its way to the value with the first operation as the coerced pointer turns into proper word.. that's why we shift right first instead.. Shifting left from that zero-fills the lower bit, so no info about the address is lost, but an immediate value tag is properly present internally.
local
fun strAddrOf x = Word.toString (Unsafe.cast x)
fun isEven x = Word.andb (x, 0w1) = 0w0
val x = SOME 42
val ogAddr = unsafeWord (strAddrOf x) (* a known-correct conversion: no shifting takes place *)
val badAddr = Unsafe.cast x << 0w1 >> 0w1
val goodAddr = Unsafe.cast x >> 0w1 << 0w1
in
val test6 = ogAddr = badAddr
val test7 = ogAddr = goodAddr
val test8 = isEven ogAddr
end
(* val test6 = false : bool
val test7 = true : bool
val test8 = true : bool *)
This shifting in addrOf
allows you to get the pointer value directly without intermediate conversions (and boxing) to string
or word64
. Of course, this solution breaks down with actual unboxed values so it's good to test for whether the object is boxed to begin with (Unsafe.boxed
) in your addrOf
definition, and return 0wx0 in case you were working with immediates.
Hope this works for your purposes. It certainly did for mine so far!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With