Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Querying Data from Apache Flink

I am looking to migrate from a homegrown streaming server to Apache Flink. One thing that we have is a Apache Storm like DRPC interface to run queries against the state held in the processing topology.

So for example: I have a bunch of sensors that I am running an moving average on. I want to run a query on the topology and return all the sensors where that average is above a fixed value.

Is there an equivalent in Flink, or if not, what is the best way to achieve equivalent functionality?

like image 725
No One Avatar asked Mar 21 '16 14:03

No One


People also ask

Does Flink support SQL?

Flink's SQL support is based on Apache Calcite which implements the SQL standard. This page lists all the supported statements supported in Flink SQL for now: SELECT (Queries) CREATE TABLE, CATALOG, DATABASE, VIEW, FUNCTION.

Is Flink A ETL?

Flink enables real-time data analytics on streaming data and fits well for continuous Extract-transform-load (ETL) pipelines on streaming data and for event-driven applications as well.

Is Flink a database?

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.

How does Flink work in SQL?

Flink SQL is a unified API for batch and stream processing: this allows using the same queries for processing historic data and real-time data. Support for both processing time and event time semantics. Support for working with nested Avro and JSON data. User-defined scalar, aggregation, and table-valued functions.


2 Answers

Out-of-box Flink does not come with a solution for querying the internal state of operations right now. You're lucky however, because there are two solutions: We did an example of a stateful word count example that allows querying the state. This is available here: https://github.com/dataArtisans/query-window-example

For one of the upcoming versions of Flink we are also working on a generic solution to the queryable state use case. This will allow querying the state of any internal operation.

Also, could it also suffice, in your case, to just periodically output the values to something like Elasticsearch using a Window Operation. The results could then simply be queried from Elasticsearch.

like image 86
aljoscha Avatar answered Oct 06 '22 19:10

aljoscha


They are coming with Out-of-box solution called Queryable State in next release. Here is an example
https://github.com/apache/flink/blob/master/flink-tests/src/test/java/org/apache/flink/test/query/QueryableStateITCase.java

But I suggest you should read about it more first then see the example.

like image 25
Pushpendra Jaiswal Avatar answered Oct 06 '22 18:10

Pushpendra Jaiswal