Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I avoid having too many datapoints (10000) in a KQL query?

In a workbook, I'm displaying a time chart comparing throttle vs non-throttle requests. The query employs a summarize operator, and while it functions properly, it frequently encounters the "too many datapoints (10000)" error. I attempted to use the limit operator, but it did not work (I think its due to the fact that summarize operator is doing two operations and data points exceed for one of them)

In addition, I've included a Timerange picker in the workbook to limit the time span, but even a small window on a busy day can cause the error to appear.

I was also thinking about limiting records in another way, but I'm not sure how. The chart must show RPS so cannot change that to RPM.

 Requests
    | project TimeGenerated, ResultType
    | summarize Throttled =  countif(ResultType == "Throttled") , Total_Requests = count() by  bin(TimeGenerated,1s);
  //| limit 9999 <-- This does not work

enter image description here

like image 719
Razi Avatar asked Oct 18 '25 02:10

Razi


1 Answers

This is straightforward rendering issue.

First -
There are always limitations, whether it is in the graph visualization library, your screen display resolution, the amount of data your web browser can handle without getting stuck, etc.

Second -
You gain nothing from staring at a graph like this:

Overpacked graph

You need to lower the granularity of the graph.
Here are a few options.

P.S.
I often prefer make-series to summarize, since it deals with absent values.
Note that this may come with extra resources utilization.

1. Increase the bin size

The choice is not between 1s and 1m.
You have a whole range in between, E.g. bin(TimeGenerated, 15s).

let Requests = materialize(range i from 1 to 1000000 step 1 | extend TimeGenerated = ago(1d*rand()), ResultType = case(rand() < 0.02, "Throttled", "Other"));
Requests
    | make-series Throttled = countif(ResultType == "Throttled") , Total_Requests = count() on TimeGenerated step 15s
    | render timechart 

Fiddle

Increased bin

2. Use double aggregation

First aggregate by 1s and then re-aggregate by a larger time span (E.g.10m), displaying the peaks. In this way your graph becomes much more sparse, however you are not losing exceptional values in 1s granularity.

let Requests = materialize(range i from 1 to 1000000 step 1 | extend TimeGenerated = ago(1d*rand()), ResultType = case(rand() < 0.02, "Throttled", "Other"));
Requests
    | summarize Throttled = countif(ResultType == "Throttled") , Total_Requests = count() by bin(TimeGenerated,1s)
    | make-series max(Throttled), max(Total_Requests) on TimeGenerated step 10m
    | render timechart 

Fiddle

Double aggregation

3. Focus on what you're really after

Many times, there is an extra step we can take in order to create a visualization that really emphasizes the phenomena we are interesting at.
In this case it might be the ratio between Throttled & Total_Requests, and then we can use a single line that makes it very easy to understand what's going on.
This graph might be used together with the previous graphs, so we won't lose the whole picture.

let Requests = materialize(range i from 1 to 1000000 step 1 | extend TimeGenerated = ago(1d*rand()), ResultType = case(rand() < 0.02, "Throttled", "Other"));
Requests
| summarize Throttled = countif(ResultType == "Throttled"), Total_Requests = count() by bin(TimeGenerated, 1s)
| make-series ratio = max(1.0 * Throttled / Total_Requests) on TimeGenerated step 5m
| render timechart 

Fiddle

Ratio

like image 189
David דודו Markovitz Avatar answered Oct 21 '25 11:10

David דודו Markovitz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!