I would like to improve the lambda-code generated for the "assert" OCaml 3.12.1 construct. Here is an example:
let f x =
assert (x = 4);
assert (2 + x = 6);
assert (x - x = 0);
exit x
The file longfilename.ml above is representative of large OCaml modules for which I would like lambda-code generation to be improved. It compiles to:
$ ocamlopt -S longfilename.ml
$ cat longfilename.s
...
.data
.quad 3072
_camlLongfilename__2:
.quad L100007
.quad 9
.quad 9
.quad 2300
L100007: .L100007:
.ascii "longfilename.ml"
.byte 0
.data
.quad 3072
_camlLongfilename__3:
.quad L100006
.quad 7
.quad 9
.quad 2300
L100006: .L100006:
.ascii "longfilename.ml"
.byte 0
.data
.quad 3072
_camlLongfilename__4:
.quad L100005
.quad 5
.quad 9
.quad 2300
L100005: .L100005:
.ascii "longfilename.ml"
.byte 0
...
The above is terribly redundant. The name of the source file each assertion may come from is duplicated. The culprit appears to be bytecomp/translcore.ml:
let assert_failed loc =
(* [Location.get_pos_info] is too expensive *)
let fname = match loc.Location.loc_start.Lexing.pos_fname with
| "" -> !Location.input_name
| x -> x
in
let pos = loc.Location.loc_start in
let line = pos.Lexing.pos_lnum in
let char = pos.Lexing.pos_cnum - pos.Lexing.pos_bol in
Lprim(Praise, [Lprim(Pmakeblock(0, Immutable),
[transl_path Predef.path_assert_failure;
Lconst(Const_block(0,
[Const_base(Const_string fname);
Const_base(Const_int line);
Const_base(Const_int char)]))])])
;;
On the face of it, it looks like it would be enough to give a name to
Const_base(Const_string fname)
, and to store and reuse it with
a compile-time hash-table. For intra-module optimization,
the changes just might be manageable
(as long as the hash-table is reset at each compilation unit).
I am a little out of my depth here, especially the “reset at each compilation unit” part. Any hint?
There already is a mechanism in the OCaml compiler to share some constants: see asmcomp/compilenv.ml
and its use, in particular of the structured_constants
value, in asmcomp/cmmgen.ml
. I am not familiar with this code so am not sure why your particular use case is not shared, but it seems like there is a difference between, in the lambda-code, Const_base (Const_string foo)
and Const_immstring foo
; the later are shared, and maybe the former are not.
I don't know what the intended semantics is for immstring
. It seems to be used by the compiler internally to compile method labels (bytecomp/translclass.ml
), but not exposed to the input language.
(I suspect the distinction is because strings are mutable, so sharing user-visible strings would be observable and change programs behavior. But string constants are already lambda-lifted so users can already observe semantically-inconsistent sharing. Increasing sharing of user-visible strings would probably still be rejected as a compatibility break.)
Looking at the way those immediate strings are handled by the constant emitting code (asmcomp/cmmgen.ml:emit_constant
), they are represented like the usual strings, so maybe you could just patch the compiler to use an immstring
in assert_failed
and things would work.
[EDIT BY OP]
Changing Const_base (Const_string fname)
into Const_immstring fname
, while slightly incompatible, allows OCaml to compile itself, to compile Frama-C and the new Frama-C passes its regression tests. On the original example, the effect is as follows, which was exactly the desired result:
$ cat longfilename.s
...
.data
.quad 3072
_camlLongfilename__2:
.quad L100005
.quad 9
.quad 9
.data
.quad 3072
_camlLongfilename__3:
.quad L100005
.quad 7
.quad 9
.data
.quad 3072
_camlLongfilename__4:
.quad L100005
.quad 5
.quad 9
.quad 2300
L100005: .L100005:
.ascii "longfilename.ml"
.byte 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With