More specifically, what usecases does Hazelcast Jet solve that Flink does not solve (equally well) and vice versa?
Hazelcast Jet is a distributed batch and stream processing framework based on Hazelcast IMDG. It allows you to write, currently, modern Java code that focuses purely on data transformation while it does all the heavy lifting of getting the data flowing and computation running across a cluster of members.
Flink offers true native streaming, while Spark uses micro batches to emulate streaming. That means Flink processes each event in real-time and provides very low latency. Spark, by using micro-batching, can only deliver near real-time processing. For many use cases, Spark provides acceptable performance levels.
Flink is a distributed processing engine and a scalable data analytics framework. You can use Flink to process data streams at a large scale and to deliver real-time analytical insights about your processed data with your streaming application.
Hazelcast Jet is free and open source software. It's available under one of two licenses: hazelcast-jet is provided under Apache License, Version 2.0. Plugins and connectors distributed in opt folder are provided under Hazelcast Community License.
NOTE: I belong to Hazelcast Jet's core engineering team.
I'd say the main advantage of Hazelcast Jet isn't in offering a brand-new computing model, but in bringing the same level of convenience that Hazelcast is known for to the realm of DAG-based distributed computing.
If you currently have a Java application running in a cluster, adding Jet will be a snap: add the Maven dependency and write one line of code to start a Jet instance on the local member. The instances will self-discover to form their own cluster, and you can now submit your job to it.
If you want a dedicated distributed computing cluster, you can download the distribution ZIP to the cluster machines. Jet has native support for the most popular cloud environments, allowing the nodes you start to self-discover. You can then connect to the cluster using a Jet client.
Needless to say, Jet makes it very convenient to use a Hazelcast IMap
or IList
as a data source. Jet cluster can host Hazelcast structures directly; then you benefit from data locality and get the data with no network traffic. On the other hand, the choice of data source is completely unconstrained and there is public API dedicated to implementing fast, arbitrarily partitioned, custom data sources.
Jet solves the concerns of infinite streams processing like aggregating over time-based windows, dealing with reordered events and resilience to changes in the cluster topology (e.g., failure of individual Jet nodes) while maintaining the Exactly-Once processing guarantee.
Jet's main programming paradigm is the Pipeline API which is quite similar to java.util.stream
API but adapted to the specifics of distributed computing (lambda serialization and other concerns).
Pipeline API builds upon a lower-level DAG-based model that is also exposed as public API.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With