Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure Stream Analytics is too slow - also time values are irrelavant

We want to migrate our dedicated servers to Azure platform for scaling easy and investigated a lot of Azure services for our needs. So one of the Azure service that we want to use is Azure Stream Analytics (ASA).

We've added some Azure Platforms according to our needs for performing some tests (it is not important what they for, for now). Here is the structure:

SimpleApp (Sending Request, Not In Azure) -> Event Hub 1 (EH1) -> ASA -> Event Hub 2 (EH2) -> Function App (FA)

  • SimpleApp sends a simple HTTP GET request to classic dedicated server which is named TESTSERVER. It tooks max 100-150ms and it represents our start time. After that it sends the message to EH1.
  • ASA's query is simple like this: SELECT * INTO [Output] FROM [Input]
  • Function App sends a simple HTTP GET request to TESTSERVER for identifying finish time.

We've shocked when we see the results from our TESTSERVER logs. It tooks 4000-5000ms!

Then we started to investigate the issue. Checked values like EventEnqueuedUtcTime and EventProcessedUtcTime to identify which block causes this slowness. But these time values are totally irrelevant. For example; EventEnqueuedUtcTime should be less than EventProcessedUtcTime but not! So this shows us time servers may be different even in different Azure blocks and we cannot use them to measure. Am I wrong?

Anyway, after this we suspected that maybe the last Azure Function App may cause this slowness. We thought that maybe Function App's Event Hub Trigger does not work well. So we designed a new test environment:

SimpleApp (Sending Request, Not In Azure) -> Event Hub 1 (EH1) -> Function App (FA1) -> Event Hub 2 (EH2) -> Function App 2 (FA2)

Second shock... It tooks only ~400ms totally!

Then, we've performed a lot of tests with different architecture which contains ASA but all of them are too slow for us.

Have you experienced any performance issues with ASA? Could you please share your experience and your flows' total time consumption?

Best regards.

like image 394
msapcili Avatar asked Jun 25 '16 02:06

msapcili


People also ask

Which of these can be a feature of Stream Analytics?

Key features:You can combine data coming from multiple streams. You can use declarative SQL-based queries for data transformations. You can stream the data to real-time dashboards with Power BI. You can integration with Azure IoT Hub.

What is Watermark delay in Stream Analytics?

Watermark DelayIndicates the delay of the streaming data processing job.

How many streaming units SUs can you apply to an azure stream analytics job if the input and output both contain 10 partitions?

You can select up to your quota in SUs for a job. By default, each Azure subscription has a quota of up to 500 SUs for all the analytics jobs in a specific region.


1 Answers

There is a latency when merging all the event in chronological order from the Event Hub.

ASA will visit all partitions from EH, get the data and organize the events into chronological order. This means that data must arrive at all partitions in the EH. I think this will also explain the strange behavior you are seeing with the EventProcessedUtcTime, it might be that because the events are ordered, the logical processing time is before the actual arrival time. Although I'm unsure about this because I do not know the inner workings of ASA.

This latency will increase with the number of partitions used, especially when the dataflow is slow.

You can sidestep the merger by partitioning on the field partitionid from EH. Make sure you are sending the data to the correct partition in EH as well.

More information can be found here at the Stream Analytics blog.

like image 181
Waaghals Avatar answered Oct 19 '22 08:10

Waaghals