The french Sécurité Sociale identification numbers end with a check code of two digits. I have verified that every possible common transcription error can be detected, and found some other kinds of errors (e.g., rolling three consecutive digits) that may stay undetected.
def check_code(number):
return 97 - int(number) % 97
def single_digit_generator(number):
for i in range(len(number)):
for wrong_digit in "0123456789":
yield number[:i] + wrong_digit + number[i+1:]
def roll_generator(number):
for i in range(len(number) - 2):
yield number[:i] + number[i+2] + number[i] + number[i+1] + number[i+3:]
yield number[:i] + number[i+1] + number[i+2] + number[i] + number[i+3:]
def find_error(generator, number):
control = check_code(number)
for wrong_number in generator(number):
if number != wrong_number and check_code(wrong_number) == control:
return (number, wrong_number)
assert find_error(single_digit_generator, "0149517490979") is None
assert find_error(roll_generator, "0149517490979") == ('0149517490979', '0149517499709')
My Python 2.7 code (working fragment above) makes heavy use of generators. I was wondering how I could adapt them in OCaml. I surely can write a function maintaining some internal state, but I'm looking for a purely functional solution. May I study the library lazy
, which I'm not too familiar with? I'm not asking for code, just directions.
Generators are great when you encounter problems that require you to read from a large dataset. Reading from a large dataset indirectly means our computer or server would have to allocate memory for it. The only condition to remember is that a Generator can only be iterated once.
Generators allow you to create iterators in a very pythonic manner. Iterators allow lazy evaluation, only generating the next element of an iterable object when requested. This is useful for very large data sets. Iterators and generators can only be iterated over once.
A generator returns a generator object, also known as an iterator, that generates one value at a time. It does not store any values. This makes a generator memory-efficient. For example, you can use a generator to loop through a group of numbers without storing any of them in memory.
Generators are memory-efficient ways of processing huge datasets. They process the data incrementally and do not allocate memory to all the results at the same time. They really come in handy when implementing data science pipelines for huge datasets in a resource-constrained environment (in terms of RAM).
Core
library provides generators in a python style, see Sequence
module.
Here is an example, taken from one of my project:
open Core_kernel.Std
let intersections tab (x : mem) : _ seq =
let open Sequence.Generator in
let init = return () in
let m = fold_intersections tab x ~init ~f:(fun addr x gen ->
gen >>= fun () -> yield (addr,x)) in
run m
You may simply define a generator as a stream, using the extension of the language :
let range n = Stream.from (fun i -> if i < n then Some i else None);;
The for
syntactic construct cannot be used with that, but there's a set of functions provided by the Stream
module to check the state of the stream and iterate on its elements.
try
let r = range 10 in
while true do
Printf.printf "next element: %d\n" @@ Stream.next r
done
with Stream.Failure -> ();;
Or more simply:
Stream.iter (Printf.printf "next element: %d\n") @@ range 10;;
You may also use a special syntax provided by the camlp4 preprocessor:
Stream.iter (Printf.printf "next element: %d\n") [< '11; '3; '19; '52; '42 >];;
Other features include creating streams out of lists, strings, bytes or even channels. The API documentation succinctly describes the different possibilities.
The special syntax let you compose them, so as to put back elements before or after, but it can be somewhat unintuitive at first:
let dump = Stream.iter (Printf.printf "next element: %d\n");;
dump [< range 10; range 20 >];;
will produce the elements of the first stream, and then pick up the second stream at element ranked right after the rank of the last yielded element, thus it will appear in this case as if only the second stream got iterated.
To get all the elements, you could construct a stream of 'a Stream.t
and then recursively iterate on each:
Stream.iter dump [< '(range 10); '(range 20) >];;
These would produce the expected output.
I recommend reading the old book on OCaml (available online) for a better introduction on the topic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With