Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting at C binary data from OCaml

(Ignoring endianness for the sake of argument - this is just a test case/proof of concept - and I would never use strcpy in real code either!)

Consider the following trivial C code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* variables of type message_t will be stored contiguously in memory */
typedef struct {
  int message_id;
  char message_text[80];
} message_t;

int main(int argc, char**argv) {
  message_t* m = (message_t*)malloc(sizeof(message_t));
  m->message_id = 1;
  strcpy(m->message_text,"the rain in spain falls mainly on the plain");

  /* write the memory to disk */
  FILE* fp = fopen("data.dat", "wb");
  fwrite((void*)m, sizeof(int) + strlen(m->message_text) + 1, 1, fp);
  fclose(fp);

  exit(EXIT_SUCCESS);
}

The file it writes can easily be read back in from disk:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct {
  int message_id;
  char message_text[80];
} message_t;

int main(int argc, char**argv) {
  message_t* m = (message_t*)malloc(sizeof(message_t));

  FILE* fp = fopen("data.dat", "rb");
  fread((void*)m, sizeof(message_t), 1, fp);
  fclose(fp);

  /* block of memory has structure "overlaid" onto it */
  printf("message_id=%d, message_text='%s'\n", m->message_id, m->message_text);

  exit(EXIT_SUCCESS);
}

E.g.

$ ./write 
$ ./read 
message_id=1, message_text='the rain in spain falls mainly on the plain'

My question is, in OCaml, if all I have is:

type message_t = {message_id:int; message_text:string}

How would I get at that data? Marshal can't do it, nor can input_binary_int. I can call out to helper functions in C like "what is sizeof(int)" then get n bytes and call a C function to "convert these bytes into an int" for example but in this case I can't add any new C code, the "unpacking" has to be done in OCaml, based on what I know it "should" be. Is it just a matter of iterating over the string either in blocks of sizeofs or looking for '\0' or is there a clever way? Thanks!

like image 691
Gaius Avatar asked May 16 '11 21:05

Gaius


1 Answers

For doing this kind of low level struct handling, I find OCaml Bitstring very convenient. The equivalent reader for your message_t would be this if you wrote all 80 characters to disk:

bitmatch (Bitstring.bitstring_from_file "data.dat") with
  | { message_id : 32;
      message_text : 8 * 80 : string;
    } -> 
      Printf.printf "message_id=%ld, message_text='%s'\n" 
                    message_id message_text
  | { _ } -> failwith "Not a valid message_t"

As is, you'll have to trim message_text, but maybe bitstring is what you want to do this kind of task in general.

like image 99
Thelema Avatar answered Sep 24 '22 12:09

Thelema