We're putting together a system that reads ~32 voltage signals through an analog-to-digital converter card, does some preliminary processing on them, and passes the results (still separated into 32 channels) to the network as UDP packets, where they are picked up by another computer and variously (a) displayed, (b) further processed, (c) searched for criteria to change the state of the acquisition system, or (d) some combination of A-C. Simultaneously a GUI process is running on the computer doing those latter processes (the vis computer), which changes state in both the data-generating computer and the vis computer's multiple processes, through UDP-packeted command messages.
I'm new to network programming and am struggling to pick a network topology. Are there any heuristics (or book chapters, papers) about network topology for relatively small applications than need to pass data, commands, and command acknowledgments flexibly?
System details:
Raw data acquisition happens on a single Linux box. Simply processing the data, saving to disk, and pushing to the network uses about 25% of the CPU capacity, and a tiny amount of memory. Less than 0.5 Mb/sec of data go to the network. All code for data-generation is in c++.
Another linux machine runs several visualization / processing / GUI processes. The GUI controls both the acquisition machine and the processes on the vis/processing/GUI computer itself. This code is mostly in c++, with a couple little utilities in Python.
We will be writing other applications that will want to listen in on the raw data, the processed data, and all the commands being passed around; those applications will want to issues commands as well - we can't anticipate how many such modules we want to write, but we expect 3 or 4 data-heavy processes that transform all 32 input streams into a single output; as well as 3 or 4 one-off small applications like a "command logger". The modularity requirement means that we want the old data-generators and command-issuers to be agnostic about how many listeners are out there. We also want commands to be acknowledged by their recipients.
The two machines are connected by a switch, and packets (both data and commands, and acknowledgments) are sent in UDP.
The five possibilities we're thinking of:
Data streams, commands, and acknowledgements are targeted by port number. The data-generator sends independent data streams as UDP packets to different port numbers bound by independent visualizer processes on the visualization computer. Each process also binds a listening port for incoming commands, and another port for incoming acknowledgments to outgoing commands. This option seems good because the kernel does the work of trafficing/filtering the packets; but bad because it's hard to see how processes address each other in the face of unpredicted added modules; it also seems to lead to an explosion of bound ports.
Data streams are targeted to their respective visualizers by port number, and each process binds a port for listening for commands. But all command-issuers send their commands to a packet-forwarder process which knows the command-in ports of all processes, and forwards each command to all of them. Acknowledgements are also sent to this universal-command-in port and forwarded to all processes. We pack information about the intended target of each command and each acknowedgment into the command/ack packets, so the processes themselves have to sift through all the commands/acks to find ones that pertain to them.
The packet-forwarder process is also the target of all data packets. All data packets and all command packets are forwarded to perhaps 40 different processes. This obviously puts a whole lot more traffic on the subnet; it also cleans up the explosion of bound ports.
Two packet-distributors could run on the vis computer - one broadcasts commands/ack's to all ports. The other broadcasts data to only ports that would possibly want data.
Our 32 visualization processes could be bundled into 1 process that draws data for the 32 signals, greatly reducing the extra traffic that option 3 causes.
If you've experimented with passing data around among multiple processes on a small number of machines, and have some wisdom or rules of thumb about which strategies are robust, I'd greatly appreciate the advice! (requests for clarification in the pics are welcome)
I don't have enough rep to move this question to programmers.stackexhange.com so I will answer it here.
First I will throw quite a few technologies at you, each of which you need to take a look at.
Hadoop A map-reduce framework. The ability to take a large sum of data, and process it across distributed nodes.
Kafka An extremely high performant messaging system. I would suggest looking at this as your message bus.
ZooKeeper A distributed system that would allow you to "figure out" all the different aspects of your distributed system. It's a coordination system that is distributed
Pub/Sub Messaging
∅mq Another socket library that allows pub/sub messaging and other N-to-N message passing arrangements.
Now that I've thrown a few technologies at you I'll explain what I would do.
Create a system that allows you to create N connectors. These connectors can handle Data/Command N in your diagram, where N is a specific signal. Meaning if you had 32 signals you can setup your system with 32 connectors to "connect". These connectors can handle two-way communications. Hence your receive/command problem. A single connector will publish it's data to something such as Kafka on a topic specific to that signal.
Use a publish/subscribe system. Essentially what happens is the connectors publish it's results to a specified topic. This topic is something you choose. Then processors, either UI, business logic, etc. listen on a specific topic. These are all arbitrary and you can set them up however you want.
============ ============= ===== ============ =============
= Signal 1= < --- > = Connector = < -- = K = --> = "signal 1" ---> = Processor =
============ ============= = a = ============ =============
= f =
============ ============= = k = ============ =============
= Signal 2= < --- > = Connector = < -- = a = --> = "signal 2" ---> = Processor =
============ ============= = = ============ | =============
= = |
============ ============= = = ============ |
= Signal 3= < --- > = Connector = < -- = = --> = "signal 3" ---
============ ============= ===== ============
In this example the first connector "publishes" it's results to topic "signal 1" in which the first processor is listening on that topic. Any data sent to that topic is sent to the first processor. The second processor is listening for both "signal 2" and "signal 3" data. This represents something like a User Interface retrieving different signals at the same time.
One thing to keep in mind is that this can happen across whatever topics you choose. A "processor" can listen to all topics if you deem it important.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With