Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

U- SQL Unable to extract data from JSON file

I was trying to extract data from a JSON file using USQL. Either the query runs successfully without producing any output data or results in "vertex failed fast error".

The JSON file looks like:

{
  "results": [
    {
      "name": "Sales/Account",
      "id": "7367e3f2-e1a5-11e5-80e8-0933ecd4cd8c",
      "deviceName": "HP",
      "deviceModel": "g6-pavilion",
      "clientip": "0.41.4.1"
    },
    {
      "name": "Sales/Account",
      "id": "c01efba0-e0d5-11e5-ae20-af6dc1f2c036",
      "deviceName": "acer",
      "deviceModel": "veriton",
      "clientip": "10.10.14.36"
    }
  ]
}

And my U-SQL script is

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

DECLARE @in string="adl://xyz.azuredatalakestore.net/todelete.json";

DECLARE @out string="adl://xyz.azuredatalakestore.net/todelete.tsv";

@trail2=EXTRACT results string FROM @in USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

@jsonify=SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(results,"name","id","deviceName","deviceModel","clientip") AS rec FROM @trail2;

@logSchema=SELECT rec["name"] AS sysName,
              rec["id"] AS sysId,
              rec["deviceName"] AS domainDeviceName,
              rec["deviceModel"] AS domainDeviceModel,
              rec["clientip"] AS domainClientIp 
       FROM @jsonify;

OUTPUT @logSchema TO @out USING Outputters.Tsv();
like image 261
Sarath Rachuri Avatar asked Mar 10 '16 10:03

Sarath Rachuri


2 Answers

Actually the JSONExtractor supports the rowpath parameter expressed in JSONPath that gives you the ability to identify the JSON object or JSON array items that you want to map into rows. So you can extract your data with a single statement from your JSON document:

@logSchema = 
    EXTRACT name string, id string, deviceName string, deviceModel string, clientip string
    FROM @input
   USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("results[*]");
like image 62
Michael Rys Avatar answered Nov 04 '22 20:11

Michael Rys


Sarath,

the problem was that your @trail2 output was a json array "[{...},{...}]" which the JsonFunction can't parse, as far as I can tell. So I outputted that to a file and reread it with the inputter, which can parse arrays.

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

DECLARE @in string="adl://xyz.azuredatalakestore.net/todelete.json";
DECLARE @out string="adl://xyz.azuredatalakestore.net/todelete.tsv";
DECLARE @mid string="adl://xyz.azuredatalakestore.net/intermediate.txt";


@trail2=EXTRACT results string FROM @in USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

OUTPUT @trail2 TO @mid USING Outputters.Text(quoting:false);

@jsonify=EXTRACT name string,
                id string, 
                deviceName string ,
                deviceModel string,
                clientip string
FROM @mid USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

@logSchema=SELECT name AS sysName,
              id AS sysId,
              deviceName AS domainDeviceName,
              deviceModel AS domainDeviceModel,
              clientip AS domainClientIp 
       FROM @jsonify;

OUTPUT @logSchema TO @out USING Outputters.Tsv();
like image 43
Michael Ludwig Avatar answered Nov 04 '22 19:11

Michael Ludwig