Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to share business concepts across different programming languages?

We develop a distributed system built from components implemented in different programming languages (C++, C# and Python) and communicating one with another across a network. All the components in the system operate with the same business concepts and communicate one with another also in terms of these concepts.

As a results we heavily struggle with the following two challenges:

  1. Keeping the representation of our business concepts in these three languages in sync
  2. Serialization / deserialization of our business concepts across these languages

A naive solution for this problem would be just to define the same data structures (and the serialization code) three times (for C++, C# and Python).

Unfortunately, this solution has serious drawbacks:

  • It creates a lot of “code duplication”
  • It requires a huge amount of cross-language integration tests to keep everything in sync

Another solution we considered is based on the frameworks like ProtoBufs or Thrift. These frameworks have an internal language, in which the business concepts are defined, and then the representation of these concepts in C++, C# and Python (together with the serialization logic) is auto-generated by these frameworks.

While this solution doesn’t have the above problems, it has another drawback: the code generated by these frameworks couples together the data structures representing the underlying business concepts and the code needed to serialize/deserialize these data-structures.

We feel that this pollutes our code base – any code in our system that uses these auto-generated classes is now “familiar” with this serialization/deserialization logic (a serious abstraction leak).

We can work around it by wrapping the auto-generated code by our classes / interfaces, but this returns us back to the drawbacks of the naive solution.

Can anyone recommend a solution that gets around the described problems?

like image 330
Lev Avatar asked Aug 03 '12 20:08

Lev


People also ask

How do different programming languages talk to each other?

In the simple case, different languages are compiled to the same code. For example, C and C++ code typically is compiled into machine assembler or C# and VB.Net is compiled into IL (the language understood by the . NET runtime). It gets more difficult if the languages/compilers use a differnt type system.

Can two programming languages work together?

There are many ways programming languages are interoperable with one another. HTML, CSS, and JavaScript are interoperable as they are used in tandem in webpages. Some object oriented languages are interoperable thanks to their shared hosting virtual machine (e.g. .


4 Answers

Lev, you may want to look at ICE. It provides object-oriented IDL with mapping to all the languages you use (C++, Python, .NET (all .NET languages, not just C# as far as I understand)). Although ICE is a middle-ware framework, you don't have to follow all its policies.

Specifically in your situation you may want to define the interfaces of your components in ICE IDL and maintain them as part of the code. You can then generate code as part of your build routine and work from there. Or you can use more of the power that ICE gives you.

ICE support C++ STL data structures and it supports inheritance, hence it should give you sufficiently powerful formalism to build your system gradually over time with good degree of maintainability.

like image 170
Boris Liberman Avatar answered Oct 14 '22 03:10

Boris Liberman


Well, once upon a time MS tried to solve this with IDL. Well, actually it tried to solve a bit more than defining data structures, but, anyway, that's all in the past because no one in their right mind would go the COM route these days.

One option to look at is SWIG which is supposed to be able to port data structures as well as actual invocation across languages. I haven't done this myself but there's a chance it won't couple the serialization and data-structures so tightly as protobufs.

However, you should really consider whether the aforementioned coupling is such a bad thing after all. What would be the ideal solution for you? Supposedly it's something that does two things: it generates compatible data structures across multiple languages based on one definition and it also provides the serialization code to stitch them together - but in a separate abstraction layer. The idea being that if one day you decide to use a different serialization method you could just switch out that layer without having to redefine all your data structures. So consider that - how realistic is it really to expect to some day switch out only the serialization code without touching the interfaces at all? In most cases the serialization format is the most permanent design choice, since you usually have issues with backwards compatibility, etc. - so how much are you willing to pay right now in development cost in order to be able to theoretically pull that off in the future?

Now let's assume for a second that such a tool exists which separates data structure generation from serialization. And lets say that after 2 years you decide you need a completely different serialization method. Unless this tool also supports plugable serialization formats you would need to develop that layer anyway in order to stitch your existing structures to the new serialization solution - and that's about as much work as just choosing a new package altogether. So the only real viable solution that would answer your requirements is something that not only support data type definition and code generation across all your languages, and not only be serialization agnostic, but would also have ready made implementation of that future serialization format you would want to switch to - because if it's only agnostic to the serialization format it means you'd still have the task of implementing it on your own - in all languages - which isn't really less work than redefining some data structures.

So my point is that there's a reason serialization and data type definition so often go together - it's simply the most common use case. I would take a long look at what exactly you wish to be able to achieve using the abstraction level you require, think of how much work developing such a solution would entail and if it's worth it. I'm certain that are tools that do this, btw - just probably the expensive proprietary kind that cost $10k per license - the same argument applies there in my opinion - it's probably just over engineering.

like image 37
Assaf Lavie Avatar answered Oct 14 '22 03:10

Assaf Lavie


All the components in the system operate with the same business concepts and communicate one with another also in terms of these concepts.

When I got you right, you have split up your system in different parts communicating by well-defined interfaces. But your interfaces share data structures you call "business concepts" (hard to understand without seeing an example), and since those interfaces have to build for all of your three languages, you have problems keeping them "in-sync".

When keeping interfaces in sync gets a problem, then it seems obvious that your interfaces are too broad. There are different possible reasons for that, with different solutions.

Possible Reason 1 - you overgeneralized your interface concept. If that's the case, redesign here: throw generalization over board and create interfaces which are only as broad as they have to be.

Possible reason 2: parts written in different languages are not dealing with separate business cases, you may have a "horizontal" partition between them, but not a vertical. If that's the case, you cannot avoid the broadness of your interfaces.

Code generation may be the right approach here if reason 2 is your problem. If existing code generators don't suffer your needs, why don't you just write your own? Define the interfaces for example as classes in C#, introduce some meta attributes and use reflection in your code generator to extract the information again when generating the according C++, Python and also the "real-to-be-used" C# code. If you need different variants with or without serialization, generate them too. A working generator should not be more effort than a couple of days (YMMV depending on your requirements).

like image 2
Doc Brown Avatar answered Oct 14 '22 05:10

Doc Brown


I agree with Tristan Reid (wrapping the business logic). Actually, some months ago I faced the same problem, and then I incidentally discovered the book "The Art Of Unix Programming" (freely available online). What grabbed my attention was the philosophy of separating policy from mechanism (i.e. interfaces from engines). Modern programming environments such as the NET platform try to integrate everything under a single domain. In those days I was asked for developing a WEB application that had to satisfy the following requirements:

  1. It had to be easily adapted to future trends of User Interfaces without having to change the core algorithms.

  2. It had to be accessible by means of different interfaces: web, command line and desktop GUI.

  3. It had to run on Windows and Linux.

I bet for developing the mechanism (engines) completely in C/C++ and using native OS libraries (POSIX or WinAPI) and good open source libraries (postgresql, xml, etc...). I developed the engine modules as command-line programs and I eventually implemented 2 interfaces: web (with PHP+JQuery framework) and desktop (NET framework). Both interfaces had nothing to do with the mechanisms: they simply launched the core modules executables by calling functions such as CreateProcess() in Windows, or fork() in UNIX, and used pipes to monitor their processes.

I'm not saying UNIX Programming Philosophy is good for all purposes, but I am applying it from then with good results and maybe it will work for you too. Choose a language for implementing the mechanism and then use another that makes interface design easy.

like image 1
Claudix Avatar answered Oct 14 '22 03:10

Claudix