I've often read that exceptions are somewhat slow and should be avoided if performance is an issue (for instance, in Java, F#, etc). Does that apply to common OCaml functions such as <code>Hashtbl.find</code>, which return exceptions for elements not found? In particular, if I want my application to be efficient, should I always test element membership using, for instance, <code>Hashtable.mem</code> before calling <code>Hashtbl.find</code>? Or would the extra comparison of the <code>mem</code> function negatively impact performance?

To first answer the concrete question for <code>Hashtbl.mem</code> before <code>Hashtbl.find</code>: Don't do it as the check for the existence of the element in the table will have to be dont twice; while the cost will be O(1) for both strategies, calling <code>Hashtbl.mem</code> first will compute the hash-value and the lookup twice -- which can take quite longer than fetching an exception. As a general advice, only create functions that raise exceptions if the probability of an exception is low. The exceptions are not taken into consideration for the type system, so more robust implementations will have an <code>'a option</code> return value instead of <code>'a</code> plus a possible exception. The ocaml core library uses a suffix of <code>_exn</code> to make clear that a function may throw an exception but generally prefers the non-exception throwing ones. So for the hashtable you should (in a perfect world) have two functions: <code>Hashtbl.find_exn : 'a t -> key -> 'a (* throws Not_found *)</code> <code>Hashtbl.find : 'a t -> key -> 'a option</code> If in doubt, use the second one. If it is not for pure speed, you could write a simple wrapper around the original hash-table: <code>find_safe h k = try Some (Hashtbl.find h k) with Not_found -> None</code>

OCaml performance of exceptions

Tags:

performance

f#

ocaml

I've often read that exceptions are somewhat slow and should be avoided if performance is an issue (for instance, in Java, F#, etc). Does that apply to common OCaml functions such as Hashtbl.find, which return exceptions for elements not found?

In particular, if I want my application to be efficient, should I always test element membership using, for instance, Hashtable.mem before calling Hashtbl.find? Or would the extra comparison of the mem function negatively impact performance?

798

asked Aug 28 '12 13:08

anol

2 Answers

OCaml exception handling make raising and catching exceptions extremely fast -- see this SO thread for internal details on how it's implemented. I haven't taken care to benchmark it precisely, but my random guess would be that it is in the ballpark of an indirect function call.

It is known that OCaml exceptions are significantly faster, in proportion to the rest of the language, than exceptions of F# for example -- this gave rise to performance problem for people porting their code from OCaml to F#. In OCaml, exceptions don't cause performance problems.

Calling Hashtbl.mem before Hashtbl.find is likely to be slower than catching the exception. The idiomatic style tends to be try Hashtbl.find .. with Not_found -> ....

That said, there is a sensible movement in the OCaml community to use more explicit error handling style, using option types rather than exceptions. The rationale is not based on performances, but on the fact that the type-checker then stop you from forgetting to handle an error situation. I would prefer that when designing a fresh new API. When using a third-party function that raise exceptions, make sure to catch all possible exceptions immediately; doing otherwise is a design smell in general and should be very heavily justified.

Due to their convenience, OCaml exceptions are also often used as a pure control-flow mechanism (and not to signal a rare failure condition). You will encounter code such as:

try
  for i = 0 to .... do
    if .. then raise Exit
  done; false
with Exit -> true

Finally, I feel that you might be taking a bad approach to implementation choices. Asking general micro-questions about performances is generally not the way to go. Think of correctness and readability first. Performance questions should generally come later, and only in situation where it is measurable/profilable.

200

answered Sep 24 '22 13:09

gasche

To first answer the concrete question for Hashtbl.mem before Hashtbl.find: Don't do it as the check for the existence of the element in the table will have to be dont twice; while the cost will be O(1) for both strategies, calling Hashtbl.mem first will compute the hash-value and the lookup twice -- which can take quite longer than fetching an exception.

As a general advice, only create functions that raise exceptions if the probability of an exception is low. The exceptions are not taken into consideration for the type system, so more robust implementations will have an 'a option return value instead of 'a plus a possible exception. The ocaml core library uses a suffix of _exn to make clear that a function may throw an exception but generally prefers the non-exception throwing ones.

So for the hashtable you should (in a perfect world) have two functions:

Hashtbl.find_exn : 'a t -> key -> 'a (* throws Not_found *)

Hashtbl.find : 'a t -> key -> 'a option

If in doubt, use the second one. If it is not for pure speed, you could write a simple wrapper around the original hash-table:

find_safe h k = try Some (Hashtbl.find h k) with Not_found -> None

answered Sep 25 '22 13:09

lambdapower

Related questions
                            
                                Is using the 'ref' keyword for string parameters in methods good for performance in C#? [duplicate]
                            
                                Union of two or more (hash)maps
                            
                                DataTypeFactory usage in creating XMLGregorianCalendar hits performance badly
                            
                                measure performance of a common-lisp program
                            
                                Java: manually-unrolled loop is still faster than the original loop. Why?
                            
                                Performance of COUNT SQL function
                            
                                Why nosql with cassandra instead of mysql?
                            
                                C++ backend with C# frontend?
                            
                                Improve speed of reading and converting from binary file?
                            
                                Would a better graphics card or more cores make Mathematica faster?
                            
                                Fastest way to import millions of JSON documents to MongoDB
                            
                                Why is numpy's einsum slower than numpy's built-in functions?
                            
                                Java 8 nested loops with streams & performance
                            
                                Why is `find -depth 1` so slow to list directories?
                            
                                How to ignore login and logout requests in JMeter?
                            
                                Why is PostgreSQL so slow on Windows?
                            
                                Is there a fast way to invert a matrix in Matlab?
                            
                                Select random element from a set, faster than linear time (Haskell)
                            
                                Given r^2, is there an efficient way to compute r^3?
                            
                                Parallel construction of a distance matrix

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With