Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

U-SQL - Extract data from json-array

Already tried the suggested JSONPath option, but it seems the JSONExtractor only recognizes root level. In my case I have to deal with a nested json-structure, with an array as well (see example below). Any options for extracting this without multiple intermediate files?

"relation": {
"relationid": "123456",
"name": "relation1",
"addresses": {
    "address": [{
        "addressid": "1",
        "street": "Street 1",
        "postcode": "1234 AB",
        "city": "City 1"
        },
    {
        "addressid": "2",
        "street": "Street 2",
        "postcode": "5678 CD",
        "city": "City 2"
    }]
}}

SELECT relationid, addressid, street, postcode, city ?

like image 282
Sander Avatar asked Jul 01 '16 14:07

Sander


1 Answers

After fixing your JSON fragment to:

{
    "relation": {
        "relationid": "123456",
        "name": "relation1",
        "addresses": {
            "address": [{
                "addressid": "1",
                "street": "Street 1",
                "postcode": "1234 AB",
                "city": "City 1"
            }, {
                "addressid": "2",
                "street": "Street 2",
                "postcode": "5678 CD",
                "city": "City 2"
            }]
        }
    }
}

and putting it into a file, the following script will get you what you want. Note that you need to navigate down the structure to carry along the higher level items and once you encounter an array, you CROSS APPLY EXPLODE it if you only need those that have the array, or OUTER APPLY EXPLODE it if you want rows with missing arrays.

DECLARE @input string = @"/temp/stackoverflow.json";

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

@json = 
EXTRACT relationid int, 
        name string, 
        addresses string
  FROM @input 
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("relation");

@relation =
SELECT relationid,
       name,
       Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(addresses)["address"] AS address_array
FROM @json;

@addresses = 
SELECT relationid, 
       name, 
       Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(address) AS address
FROM @relation
     CROSS APPLY
        EXPLODE (Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(address_array).Values) AS A(address);

@result =
SELECT relationid,
       name,
       address["addressid"]AS addressid,
       address["street"]AS street,
       address["postcode"]AS postcode,
       address["city"]AS city
FROM @addresses;

OUTPUT @result
TO "/users/temp/st_out.csv"
USING Outputters.Csv();
like image 160
Michael Rys Avatar answered Oct 21 '22 20:10

Michael Rys