Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improving the lambda-code OCaml generates for assertions

Tags:

ocaml

I would like to improve the lambda-code generated for the "assert" OCaml 3.12.1 construct. Here is an example:

let f x =
    assert (x = 4);
    assert (2 + x = 6);
    assert (x - x = 0);
    exit x

The file longfilename.ml above is representative of large OCaml modules for which I would like lambda-code generation to be improved. It compiles to:

$ ocamlopt -S longfilename.ml
$ cat longfilename.s
...
    .data
    .quad   3072
_camlLongfilename__2:
    .quad   L100007
    .quad   9
    .quad   9
    .quad   2300
L100007: .L100007:
    .ascii  "longfilename.ml"
    .byte   0
    .data
    .quad   3072
_camlLongfilename__3:
    .quad   L100006
    .quad   7
    .quad   9
    .quad   2300
L100006: .L100006:
    .ascii  "longfilename.ml"
    .byte   0
    .data
    .quad   3072
_camlLongfilename__4:
    .quad   L100005
    .quad   5
    .quad   9
    .quad   2300
L100005: .L100005:
    .ascii  "longfilename.ml"
    .byte   0
...

The above is terribly redundant. The name of the source file each assertion may come from is duplicated. The culprit appears to be bytecomp/translcore.ml:

let assert_failed loc =
  (* [Location.get_pos_info] is too expensive *)
  let fname = match loc.Location.loc_start.Lexing.pos_fname with
              | "" -> !Location.input_name
              | x -> x
  in
  let pos = loc.Location.loc_start in
  let line = pos.Lexing.pos_lnum in
  let char = pos.Lexing.pos_cnum - pos.Lexing.pos_bol in
  Lprim(Praise, [Lprim(Pmakeblock(0, Immutable),
          [transl_path Predef.path_assert_failure;
           Lconst(Const_block(0,
              [Const_base(Const_string fname);
               Const_base(Const_int line);
               Const_base(Const_int char)]))])])
;;

On the face of it, it looks like it would be enough to give a name to Const_base(Const_string fname), and to store and reuse it with a compile-time hash-table. For intra-module optimization, the changes just might be manageable (as long as the hash-table is reset at each compilation unit).

I am a little out of my depth here, especially the “reset at each compilation unit” part. Any hint?

like image 919
Pascal Cuoq Avatar asked Apr 04 '12 09:04

Pascal Cuoq


1 Answers

There already is a mechanism in the OCaml compiler to share some constants: see asmcomp/compilenv.ml and its use, in particular of the structured_constants value, in asmcomp/cmmgen.ml. I am not familiar with this code so am not sure why your particular use case is not shared, but it seems like there is a difference between, in the lambda-code, Const_base (Const_string foo) and Const_immstring foo; the later are shared, and maybe the former are not.

I don't know what the intended semantics is for immstring. It seems to be used by the compiler internally to compile method labels (bytecomp/translclass.ml), but not exposed to the input language.

(I suspect the distinction is because strings are mutable, so sharing user-visible strings would be observable and change programs behavior. But string constants are already lambda-lifted so users can already observe semantically-inconsistent sharing. Increasing sharing of user-visible strings would probably still be rejected as a compatibility break.)

Looking at the way those immediate strings are handled by the constant emitting code (asmcomp/cmmgen.ml:emit_constant), they are represented like the usual strings, so maybe you could just patch the compiler to use an immstring in assert_failed and things would work.

[EDIT BY OP]

Changing Const_base (Const_string fname) into Const_immstring fname, while slightly incompatible, allows OCaml to compile itself, to compile Frama-C and the new Frama-C passes its regression tests. On the original example, the effect is as follows, which was exactly the desired result:

$ cat longfilename.s 
...
    .data
    .quad   3072
_camlLongfilename__2:
    .quad   L100005
    .quad   9
    .quad   9
    .data
    .quad   3072
_camlLongfilename__3:
    .quad   L100005
    .quad   7
    .quad   9
    .data
    .quad   3072
_camlLongfilename__4:
    .quad   L100005
    .quad   5
    .quad   9
    .quad   2300
L100005: .L100005:
    .ascii  "longfilename.ml"
    .byte   0
like image 111
gasche Avatar answered Nov 14 '22 00:11

gasche