Is it possible to use KSQL to not only count entries of a specific column via GROUP BY
but instead get an aggregate over all the entries that stream through the application?
I'm searching for something like this:
| Count all | Count id1 | count id2 |
| ---245----|----150----|----95-----|
Or more like this in KSQL:
[some timestamp] | Count all | 245
[some timestamp] | Count id1 | 150
[some timestamp] | Count id2 | 95
.
.
.
Thank you
- Tim
You cannot have both counts for the all and count for each key in the same query. You can have two queries here, one for counting each value in the given column and another for counting all values in the given column. Let's assume you have a stream with two columns, col1 and col2. To count each value in col1 with infinite window size you can use the following query:
SELECT col1, count(*) FROM mystream1 GROUP BY col1;
To count all the rows you need to write two queries since KSQL always needs GROUP BY clause for aggregation. First you create a new column with constant value and then you can count the values in new column and since it is a constant, the count will represent the count of all rows. Here is an example:
CREATE STREAM mystream2 AS SELECT 1 AS col3 FROM mystream1;
SELECT col3, count(*) FROM mystream2 GROUP BY col3;
you can do a extended describe on the stream or table to see the total messages
ksql> describe extended <stream or table name>
sample output
Local runtime statistics
------------------------
messages-per-sec: 0 total-messages: 2415888 last-message: 2019-12-06T02:29:43.005Z
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With