Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Physical identity based alternative to Hashtbl.hash

I'm trying to derive a Graphviz file describing a structured value. This is for diagnostic purposes so I want my graph to mirror the actual structure in memory as closely as possible. I'm using the below to map values to Graphviz vertices so that I can reuse a vertex when a value has two or more inbound references:

let same = (==)

module StateIdentity : Hashtbl.HashedType = struct
  type t = R.meta_t state
  let hash = Hashtbl.hash
  let equal = same
end

module StateHashtbl = Hashtbl.Make (StateIdentity)

The documentation for Hashtbl.hash suggests that it is suitable for use both when StateIdentity.equal = (=) and when StateIdentity.equal = (==) but I'd like to ensure that hash table access is as close to O(1) as possible so would rather not have Hashtbl.hash walking a (potentially large in this case) object graph on every lookup.

I know Ocaml moves references around, but is there an O(1) proxy for reference identity available in Ocaml?

The answer to Hashtable of mutable variable in Ocaml suggests not.

I'm loathe to attach serial numbers to states, since this is diagnostic code so any errors I make doing that have the potential to mask other bugs.

like image 859
Mike Samuel Avatar asked Oct 24 '12 15:10

Mike Samuel


3 Answers

If you are using the word "object" in the sense of OCaml's < ... > object types, then you can use Oo.id to get a unique integer identity for each instance. Otherwise the answer to "is there a general proxy for value identity" is "no". In this case my advice would be to start with Hashtbl.hash, evaluate whether it fits your need, and otherwise design your own hashing function.

You can also play with Hashtbl.hash_param (see documentation) to turn knob on value traversals during hashing. Note that the Hashtbl code uses linked lists for bucket of same-hash values, so having lots of hash conflicts will trigger linear search behavior. It may be better to move to other implementations using binary search trees for conflict buckets. But then again, you should evaluate your situation before moving to more complex (and with worse performances in the "good case") solutions.

like image 60
gasche Avatar answered Nov 18 '22 23:11

gasche


I've found it very tricky to use physical equality to do hashing. You certainly can't use something like the address of the value as your hash key, because (as you say) things get moved around by GC. Once you have a hash key, it seems like you can use physical equality to do comparisons as long as your values are mutable. If your values aren't mutable, OCaml doesn't guarantee much about the meaning of (==). In practical terms, immutable objects that are equal (=) can theoretically be merged into a single physical object if the OCaml compiler or runtime wishes (or vice versa).

When I work through the various possibilities, I usually end up putting a sequence number into my values when I need a unique id. As gasche says, you can use Oo.id if your values are actual OO-style objects.

like image 5
Jeffrey Scofield Avatar answered Nov 18 '22 23:11

Jeffrey Scofield


Like others, I think unique IDs are the way to go.

Unique IDs are not hard to generate safely. One solution is to use a so-called private record as follows. It prevents users of the module from copying the id field:

module type Intf =
sig
  type t = private {
    id : int;
    foo : string;
  }

  val create_t : foo: string -> t
end

module Impl : Intf =
struct
  type t = {
    id : int;
    foo : string;
  }

  let create_id =
    let n = ref 0 in
    fun () ->
      if !n = -1 then
        failwith "Out of unique IDs"
      else (
        incr n;
        !n
      )

  let create_t ~foo = {
    id = create_id ();
    foo
  }
end
like image 4
Martin Jambon Avatar answered Nov 18 '22 23:11

Martin Jambon