Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

copy objects between different Virtual-Machines efficiently

I have a feeling that I am going to ask a "stupid" question, yet I must ask ...

I have 2 virtual machines.

I would like to copy an instance of an object from one to another,

Is it possible to copy the bits that represents this object in the VM's heap, send it to the other VM, like that the other VM just need to allocate the bits in it's memory and add a reference in it's stack to this memory slot...?

Currently, in order to do such a thing we serialize the object and unserialize it, which is much less efficient(computational wise) than just copy the instance as is...the parsing is a computational waste...

JS serialization Example: each VM is an instance of V8 (JavaScript), one way of doing it is to convert the object to JSON(JSON.stringify), send it some how to the other VM which get the string and convert it back to object ( e.g. var myObject = eval('(' + myJSONtext + ')');) .. (JavaScript is just an example here, this is some sort of serialization)

like image 561
DuduAlul Avatar asked Aug 26 '10 12:08

DuduAlul


2 Answers

Lets ignore for a second the naive assumption that you can generalize this question over multiple VMs easily. Any attempt to build a mechanism like this would be heavily dependent on the implementation details of the VM you were building the mechanism for.

Here are several reasons why this isn't done:

  1. In-core representation is not generally portable across architectures. If I were sending an "object" from a VM on a SPARC machine to a VM on an x86 machine without knowledge of its structure, the object would appear corrupt on the other side.

  2. The object will not neccesarily exist at the same memory location on both machines, so internal pointers within the object will need to be patched up after it reaches the second VM. This too requires internal knowledge of the object's structure.

  3. The object probably contains references to other objects, thus copying an object means copying a tree of objects, and generally not an acyclic tree either. You end up building code that looks an awful lot like a serialization library in order to do this reliably.

  4. Objects often hold on to native resources (like file handles and sockets) that can't be reliably transmitted across machines.

  5. In many VMs, there is a distinction made between data (the object you're trying to copy) and metadata (for example, the class of the object you're trying to copy). In these kinds of VMs, even if you could copy the object bit-for-bit unscathed, it might depend on a bunch of metadata that doesn't exist at the remote end. Copying metadata bit-for-bit is also tricky, as many VMs use implementation techniques (such as a global pool of interned strings or memory mapped object code) that make the data inherently non-portable. You also might end up with much more metadata than you want (e.g. in .net the smallest unit of metadata that you can package up and send somewhere is typically an assembly).

  6. In-core representation is generally not portable among different versions of the same VM and don't contain internal version information that could be used to patch up the data.

  7. In-core representation contains lots of things (e.g. inline caches, garbage collection information) that don't need to be copied. Copying this stuff would be wasteful, and the information might not even be sensible on the other side.

Basically, to do this reliably, you end up building the world's most awkward and unreliable serialization library, and the performance gains of the simple memory copy are lost in patching up the many things that get broken when you do the copy naively.

Thus, these mechanisms tend not to exist.

There is one huge exception to this rule: image based virtual machines (such as many smalltalk and self VMs) are built around the idea that the virtual machine state exists in an "image" that can be copied, moved between machines, etc. This generally comes at a substantial performance cost.

like image 71
blucz Avatar answered Sep 30 '22 12:09

blucz


Why not use cpickle. It will serialize data very reliably and very quickly then you can send it over a socket, named pipe, mmap, you name it, except on the other end you can expect to reliably reassemble it as long as it didn't get corrupted in transfer and the versions of the pickle module aren't hugely different. Of course the truly enterprisey way is to use a platform agnostic standard such as XML which will let you expand platform interoperability beyond python. I know this sidesteps the question, but I think someone who's contributed to the python interpreter codebase would have to clarify that for you.

like image 42
Novikov Avatar answered Sep 30 '22 12:09

Novikov