Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to transfer a float array (without serializing/deserializing) from Scala (JeroMQ) to C (ZMQ)?

Currently, I am using a JSON library to serialize the data at the sender (JeroMQ), and deserialize at the receiver (C, ZMQ). But, while parsing, the JSON library starts to consume a lot of memory and the OS kills the process. So, I want to send the float array as it is, i.e. without using JSON.

The existing sender code is below (syn0 and syn1 are Double arrays). If syn0 and syn1 are around 100 MB each, the process is killed while parsing the received arrays, i.e. the last line of the snippet below:

import org.zeromq.ZMQ
import com.codahale.jerkson
socket.connect("tcp://localhost:5556")

socket.send(json.JSONObject(Map("syn0"->json.JSONArray(List.fromArray(syn0Global)))).toString())
println("SYN0 Request sent”)
val reply_syn0 = socket.recv(0)
println("Response received after syn0: " + new String(reply_syn0))
logInfo("Sending Syn1 request … , size : " + syn1Global.length )

socket.send(json.JSONObject(Map("syn1"->json.JSONArray(List.fromArray(syn1Global)))).toString())
println("SYN1 Request sent")
val reply_syn1 = socket.recv(0)

socket.send(json.JSONObject(Map("foldComplete"->"Done")).toString())
println("foldComplete sent")
//  Get the reply.
val reply_foldComplete = socket.recv(0)
val processedSynValuesJson = new String(reply_foldComplete)
val processedSynValues_jerkson =   jerkson.Json.parse[Map[String,List[Double]]](processedSynValuesJson)

Can these arrays be transferred without using JSON?

Here I am transferring a float array between two C programs:

//client.c
int main (void)
{
printf ("Connecting to hello world server…\n");
void *context = zmq_ctx_new ();
void *requester = zmq_socket (context, ZMQ_REQ);
zmq_connect (requester, "tcp://localhost:5555");

int request_nbr;
float send_buffer[10];
float recv_buffer[10];

for(int i = 0; i < 10; i++)
    send_buffer[i] = i;

for (request_nbr = 0; request_nbr != 10; request_nbr++) {
    //char buffer [10];
    printf ("Sending Hello %d…\n", request_nbr);
    zmq_send (requester, send_buffer, 10*sizeof(float), 0);
    zmq_recv (requester, recv_buffer, 10*sizeof(float), 0);
    printf ("Received World %.3f\n", recv_buffer[5]);
}
zmq_close (requester);
zmq_ctx_destroy (context);
return 0;
}

//server.c

int main (void)
{
//  Socket to talk to clients
void *context = zmq_ctx_new ();
void *responder = zmq_socket (context, ZMQ_REP);
int rc = zmq_bind (responder, "tcp://*:5555");
assert (rc == 0);
float recv_buffer[10];
float send_buffer[10];
while (1) {
    //char buffer [10];
    zmq_recv (responder, recv_buffer, 10*sizeof(float), 0);
    printf ("Received Hello\n");
    for(int i = 0; i < 10; i++)
            send_buffer[i] = recv_buffer[i]+5;
    zmq_send (responder, send_buffer, 10*sizeof(float), 0);
}
return 0;
}

Finally, my unsuccessful attempt at doing something similar using Scala (below is the client code):

def main(args: Array[String]) {
val context = ZMQ.context(1)
val socket = context.socket(ZMQ.REQ)

println("Connecting to hello world server…")
socket.connect ("tcp://localhost:5555")
val msg : Array[Float] = Array(1,2,3,4,5,6,7,8,9,10)
val bbuf = java.nio.ByteBuffer.allocate(4*msg.length)
bbuf.asFloatBuffer.put(java.nio.FloatBuffer.wrap(msg))


for (request_nbr <- 1 to 10)  {
    socket.sendByteBuffer(bbuf,0)

}
}
like image 306
user1274878 Avatar asked Mar 23 '16 18:03

user1274878


2 Answers

SER/DES ? Size ?
No, an underlying transport-philosophy related constraint matters.

You have started with an 0.1 GB sizing for transport-payload and reported a JSON-library allocations to cause your O/S to kill the process.

Next, in other post, you have requested an 0.762 GB sizing for transport-payload.

But there is a bit more important issue in ZeroMQ transport orchestration than a choice of an external data-serialiser SER/DES policy.

No one may forbid you to try to send as big BLOB as possible, whereas a JSON-decorated string has already shown you the dark-side of such approaches, there are other reasons not to proceed this way ahead.

ZeroMQ is out of question a great and powerful toolbox. Still it takes some time for one to gain an insight necessary for indeed a smart and highly performant code-deployment, that makes maximum out of this powerful work-horse.

One of side-effects of the feature-rich internal ecosystem "under-the-hood" is a not very much known policy, hidden in a message delivery concept.

One may send any reasonable-sized message, while a delivery is not guaranteed. It is either completely delivered, or nothing gets out at all, as said above, nothing is guaranteed.

Ouch?!

Yes, not guaranteed.

Based on this core Zero-Guarrantee philosophy, one shall take due care to decide on steps and measures, the more if you plan to try to move Gigabyte BEASTs there and back.

In this very sense, it might become quantitatively supported by real SUT testing, that small-sized messages may transport ( if you indeed still need to move GBs ( refer to comment above, under the OP ) and have no other choice ) the whole volume of data segmented into smaller pieces, with error-prone re-assembly measures, which results in much faster and much safer end-to-end solution than trying to use dumb-force and instruct the code to dump about a GB of data onto whatever resources there actually are available ( Zero-Copy principle of ZeroMQ cannot and will not per-se save you in these efforts ).

For details on another hidden trap, related to not fully Zero-Copy implementation, read Martin SUSTRIK's, co-father of ZeroMQ, remarks on Zero-Copy "till-kernel-boundary-only" ( so, at least double the memory-space allocations to be expected... ).


Solution:

Redesign the architecture so as to propagate small-sized messages, if not keeping an original datastructure "mirrored" in remote process(es) instead of attempting to keep one-shot giga-transfers survivable.


The best next step?

While it does not solve your trouble with a few SLOC-s, the best thing, if you are serious about to invest your intellectual powers into distributed processing, is to read Pieter HINTJEN's lovely book "Code Connected, Vol.1"

Yes, it takes some time to generate one's own insight, but this will raise you in many aspects onto another level of professional code design. Worth time. Worth efforts.

like image 187
user3666197 Avatar answered Oct 03 '22 15:10

user3666197


You'll need to serialize the data in some form or fashion - ultimately you're taking a structure in memory on one side and instructing the other side on how to rebuild that structure (bonus points for using two separate languages where the structure in memory is likely different anyway). I'd suggest you use a new JSON library as that appears to be where the problem lies, but there are more efficient protocols you could be using. Protocol Buffers enjoy good support across many languages, that might be the place I'd start.

like image 40
Jason Avatar answered Oct 03 '22 15:10

Jason