I am looking to migrate from a homegrown streaming server to Apache Flink. One thing that we have is a Apache Storm like DRPC interface to run queries against the state held in the processing topology.
So for example: I have a bunch of sensors that I am running an moving average on. I want to run a query on the topology and return all the sensors where that average is above a fixed value.
Is there an equivalent in Flink, or if not, what is the best way to achieve equivalent functionality?
Flink's SQL support is based on Apache Calcite which implements the SQL standard. This page lists all the supported statements supported in Flink SQL for now: SELECT (Queries) CREATE TABLE, CATALOG, DATABASE, VIEW, FUNCTION.
Flink enables real-time data analytics on streaming data and fits well for continuous Extract-transform-load (ETL) pipelines on streaming data and for event-driven applications as well.
Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.
Flink SQL is a unified API for batch and stream processing: this allows using the same queries for processing historic data and real-time data. Support for both processing time and event time semantics. Support for working with nested Avro and JSON data. User-defined scalar, aggregation, and table-valued functions.
Out-of-box Flink does not come with a solution for querying the internal state of operations right now. You're lucky however, because there are two solutions: We did an example of a stateful word count example that allows querying the state. This is available here: https://github.com/dataArtisans/query-window-example
For one of the upcoming versions of Flink we are also working on a generic solution to the queryable state use case. This will allow querying the state of any internal operation.
Also, could it also suffice, in your case, to just periodically output the values to something like Elasticsearch using a Window Operation. The results could then simply be queried from Elasticsearch.
They are coming with Out-of-box solution called Queryable State in next release.
Here is an example
https://github.com/apache/flink/blob/master/flink-tests/src/test/java/org/apache/flink/test/query/QueryableStateITCase.java
But I suggest you should read about it more first then see the example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With