I'm looking for an embeddable Java library that is suitable for collecting real-time streams of sensor data in a general-purpose way. I plan to use this to develop a "hub" application for reporting on multiple disparate sensor streams, running on a JVM based server (will also be using Clojure for this).
Key things it needs to have:
Is there something out there that fits this profile that you can recommend?
The sensor data and test data are collected using the OPC client, e.g., solution temperature, pH, ORP, zinc powder dosage, flow rate of feeding solution, etc. These real-time values of process variables are important for process monitoring and setting of manipulated variables.
Sensor data is the output of a device that detects and responds to some type of input from the physical environment. The output may be used to provide information or input to another system or to guide a process. Sensors can be used to detect just about any physical element.
1. Round-robin database (wikipedia)
RRDtool (acronym for round-robin database tool) aims to handle time-series data like network bandwidth, temperatures, CPU load, etc. The data are stored in a round-robin database (circular buffer), thus the system storage footprint remains constant over time.
This approach/DB format is widely used, stable and simple enough. Out of the box it allows to generate nice plots:
There is Java implementation -- RRD4J:
RRD4J is a high performance data logging and graphing system for time series data, implementing RRDTool's functionality in Java. It follows much of the same logic and uses the same data sources, archive types and definitions as RRDTool does. Open Source under Apache 2.0 License.
Update
Forget to mention there is Clojure RRD API (examples).
2. For some experiments with real-time data I would suggest to consider Perst
It is small, fast and reliable enough, but distributed under GPLv3. Perst provides several indexing algorithms:
The last one suits your needs very well.
3. Neo4J with Relationship indexes
A good example where this approach pays dividends is in time series data, where we have readings represented as a relationship per occurrence.
4. Oracle Berkeley DB Java Edition
Oracle Berkeley DB Java Edition is an open source, embeddable, transactional storage engine written entirely in Java. It takes full advantage of the Java environment to simplify development and deployment. The architecture of Oracle Berkeley DB Java Edition supports very high performance and concurrency for both read-intensive and write-intensive workloads.
Suggestion
Give a try to RRD4J:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With