Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OCaml equivalent of Python generators

The french Sécurité Sociale identification numbers end with a check code of two digits. I have verified that every possible common transcription error can be detected, and found some other kinds of errors (e.g., rolling three consecutive digits) that may stay undetected.

def check_code(number):
    return 97 - int(number) % 97

def single_digit_generator(number):
    for i in range(len(number)):
        for wrong_digit in "0123456789":
            yield number[:i] + wrong_digit + number[i+1:]

def roll_generator(number):
    for i in range(len(number) - 2):
        yield number[:i] + number[i+2] + number[i] + number[i+1] + number[i+3:]
        yield number[:i] + number[i+1] + number[i+2] + number[i] + number[i+3:]

def find_error(generator, number):
    control = check_code(number)
    for wrong_number in generator(number):
        if number != wrong_number and check_code(wrong_number) == control:
            return (number, wrong_number)

assert find_error(single_digit_generator, "0149517490979") is None
assert find_error(roll_generator, "0149517490979") == ('0149517490979', '0149517499709')

My Python 2.7 code (working fragment above) makes heavy use of generators. I was wondering how I could adapt them in OCaml. I surely can write a function maintaining some internal state, but I'm looking for a purely functional solution. May I study the library lazy, which I'm not too familiar with? I'm not asking for code, just directions.

like image 996
Aristide Avatar asked Feb 05 '15 20:02

Aristide


People also ask

Are Python generators useful?

Generators are great when you encounter problems that require you to read from a large dataset. Reading from a large dataset indirectly means our computer or server would have to allocate memory for it. The only condition to remember is that a Generator can only be iterated once.

Why generators are better in Python?

Generators allow you to create iterators in a very pythonic manner. Iterators allow lazy evaluation, only generating the next element of an iterable object when requested. This is useful for very large data sets. Iterators and generators can only be iterated over once.

Is a generator more memory efficient?

A generator returns a generator object, also known as an iterator, that generates one value at a time. It does not store any values. This makes a generator memory-efficient. For example, you can use a generator to loop through a group of numbers without storing any of them in memory.

Why do generators use less memory?

Generators are memory-efficient ways of processing huge datasets. They process the data incrementally and do not allocate memory to all the results at the same time. They really come in handy when implementing data science pipelines for huge datasets in a resource-constrained environment (in terms of RAM).


2 Answers

Core library provides generators in a python style, see Sequence module.

Here is an example, taken from one of my project:

open Core_kernel.Std

let intersections tab (x : mem) : _ seq =
  let open Sequence.Generator in
  let init = return () in
  let m = fold_intersections tab x ~init ~f:(fun addr x gen ->
      gen >>= fun () -> yield (addr,x)) in
  run m
like image 159
ivg Avatar answered Nov 09 '22 05:11

ivg


You may simply define a generator as a stream, using the extension of the language :

let range n = Stream.from (fun i -> if i < n then Some i else None);;

The for syntactic construct cannot be used with that, but there's a set of functions provided by the Stream module to check the state of the stream and iterate on its elements.

try
  let r = range 10 in
  while true do
    Printf.printf "next element: %d\n" @@ Stream.next r
  done
with Stream.Failure -> ();;

Or more simply:

Stream.iter (Printf.printf "next element: %d\n") @@ range 10;;

You may also use a special syntax provided by the camlp4 preprocessor:

Stream.iter (Printf.printf "next element: %d\n") [< '11; '3; '19; '52; '42 >];;

Other features include creating streams out of lists, strings, bytes or even channels. The API documentation succinctly describes the different possibilities.

The special syntax let you compose them, so as to put back elements before or after, but it can be somewhat unintuitive at first:

let dump = Stream.iter (Printf.printf "next element: %d\n");;

dump [< range 10; range 20 >];;

will produce the elements of the first stream, and then pick up the second stream at element ranked right after the rank of the last yielded element, thus it will appear in this case as if only the second stream got iterated.

To get all the elements, you could construct a stream of 'a Stream.t and then recursively iterate on each:

Stream.iter dump [< '(range 10); '(range 20) >];;

These would produce the expected output.

I recommend reading the old book on OCaml (available online) for a better introduction on the topic.

like image 32
didierc Avatar answered Nov 09 '22 03:11

didierc