Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Export Custom Event Dimensions to SQL from Application Insights using Stream Analytics

I'm following the example walkthrough Export to SQL from Application Insights using Stream Analytics. I am trying to export custom event dimensions (context.custom.dimensions in the JSON example below) which get added as a nested JSON array in the data file. How do I flatten the dimensions array at context.custom.dimensions for export to SQL?

JSON...

{
  "event": [
    {
      "name": "50_DistanceSelect",
      "count": 1
    }
  ],
  "internal": {
    "data": {
      "id": "aad2627b-60c5-48e8-aa35-197cae30a0cf",
      "documentVersion": "1.5"
    }
  },
  "context": {
    "device": {
      "os": "Windows",
      "osVersion": "Windows 8.1",
      "type": "PC",
      "browser": "Chrome",
      "browserVersion": "Chrome 43.0",
      "screenResolution": {
        "value": "1920X1080"
      },
      "locale": "unknown",
      "id": "browser",
      "userAgent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
    },
    "application": {},
    "location": {
      "continent": "North America",
      "country": "United States",
      "point": {
        "lat": 38.0,
        "lon": -97.0
      },
      "clientip": "0.115.6.185",
      "province": "",
      "city": ""
    },
    "data": {
      "isSynthetic": false,
      "eventTime": "2015-07-15T23:43:27.595Z",
      "samplingRate": 0.0
    },
    "operation": {
      "id": "2474EE6F-5F6F-48C3-BA43-51636928075A"
    },
    "user": {
      "anonId": "BA05C4BE-1C42-482F-9836-D79008E78A9D",
      "anonAcquisitionDate": "0001-01-01T00:00:00Z",
      "authAcquisitionDate": "0001-01-01T00:00:00Z",
      "accountAcquisitionDate": "0001-01-01T00:00:00Z"
    },
    "custom": {
      "dimensions": [
        {
          "CategoryAction": "click"
        },
        {
          "SessionId": "73ef454d-fa39-4125-b4d0-44486933533b"
        },
        {
          "WebsiteVersion": "3.0"
        },
        {
          "PageSection": "FilterFind"
        },
        {
          "Category": "EventCategory1"
        },
        {
          "Page": "/page-in-question"
        }
      ],
      "metrics": []
    },
    "session": {
      "id": "062703E5-5E15-491A-AC75-2FE54EF03623",
      "isFirst": false
    }
  }
}
like image 650
viperguynaz Avatar asked Jul 20 '15 23:07

viperguynaz


4 Answers

A slightly more dynamic solution is to set up a temp table:

WITH ATable AS (
SELECT
     temp.internal.data.id as ID
    ,dimensions.ArrayValue.CategoryAction as CategoryAction
    ,dimensions.ArrayValue.SessionId as SessionId 
    ,dimensions.ArrayValue.WebsiteVersion as WebsiteVersion 
    ,dimensions.ArrayValue.PageSection as PageSection 
    ,dimensions.ArrayValue.Category as Category 
    ,dimensions.ArrayValue.Page as Page  
FROM [analyticseventinputs] temp 
CROSS APPLY GetElements(temp.[context].[custom].[dimensions]) as dimensions)

and then doing joins based on a unique key

FROM [analyticseventinputs] Input 
Left JOIN ATable CategoryAction on 
    Input.internal.data.id = CategoryAction.ID AND
    CategoryAction.CategoryAction <> "" AND
     DATEDIFF(day, Input, CategoryAction) BETWEEN 0 AND 5 

The rather annoying bit is the requirement for the datediff, because the joins are intended to combine 2 streams of data but in this case you are only joining on the unique key. So I set it to a large value of 5 days. This really only protects against the custom params not coming in ordered compared to the other solution.

like image 51
Xinco24 Avatar answered Oct 17 '22 00:10

Xinco24


Most tutorials online use CROSS APPLY or OUTER APPLY however this is not what you are looking for because it will put each property on a different row. To over come this use the functions: GetRecordPropertyValue and GetArrayElement as demoed below. This will flatten out the properties into a single row.

SELECT
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 0), 'CategoryAction') AS CategoryAction,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 1), 'SessionId') AS SessionId,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 2), 'WebsiteVersion') AS WebsiteVersion,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 3), 'PageSection') AS PageSection,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 4), 'Category') AS Category,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 5), 'Page') AS Page
INTO
  [outputstream]
FROM
  [inputstream] MySource
like image 43
Soto Avatar answered Oct 16 '22 22:10

Soto


What schema do you have in SQL? Do you want a single row in SQL with all the dimensions as columns?

This might not be possible today. However there will be more Array/Record functions in Azure Stream Analytics after July 30.

Then you will be able to do something like this:

SELECT 
    CASE 
        WHEN GetArrayLength(A.context.custom.dimensions) > 0
            THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 0), 'CategoryAction')
        ELSE ''
        END AS CategoryAction 
    CASE 
        WHEN GetArrayLength(A.context.custom.dimensions) > 1
            THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 1), 'WebsiteVersion')
        ELSE ''
        END AS WebsiteVersion 
    CASE 
        WHEN GetArrayLength(A.context.custom.dimensions) > 2
            THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 2), 'PageSection')
        ELSE ''
        END AS PageSection
FROM input

If you want to have separate rows per dimension then you can use CROSS APPLY operator.

like image 2
Konstantin Zoryn Avatar answered Oct 16 '22 23:10

Konstantin Zoryn


A very convenient way to do this as proposed by Alex Raizman is to do some aggregations to the fields you want to flatten, grouping by the remaining select list, assuming that

  • you know the set of possible objects in dimensions, and
  • you do not have duplicate objects in this array, and
  • there is something you can uniquely identify your initial rows (like the id )

    SELECT
      CategoryAction= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'CategoryAction') AS
      NVARCHAR(MAX))),
      SessionId= min(CAST(GetRecordPropertyValue(d.arrayvalue, 'SessionId') AS
      NVARCHAR(MAX))),
      WebsiteVersion= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'WebsiteVersion') AS
      NVARCHAR(MAX))),
      PageSection= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'PageSection') AS
      NVARCHAR(MAX))),
      Category= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'Category') AS
      NVARCHAR(MAX))),    
      Page= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'Page') AS NVARCHAR(MAX))) 
    INTO  
      [outputstream] 
    FROM [inputstream] MySource 
    CROSS APPLY GetArrayElements(MySource.[context].[custom].[dimensions]) d 
    GROUP BY System.Timestamp, MySource.id
    

We also group by System.Timestamp to create one temporal window as expected from Stream Analytics to perform set-based operations like counts or aggregations.

like image 1
athina.bikaki Avatar answered Oct 17 '22 00:10

athina.bikaki