Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse json file in U-SQL

I'm trying to parse below Json file using USQL but keep getting error.

Json file@

{"dimBetType_SKey":1,"BetType_BKey":1,"BetTypeName":"Test1"}
{"dimBetType_SKey":2,"BetType_BKey":2,"BetTypeName":"Test2"}
{"dimBetType_SKey":3,"BetType_BKey":3,"BetTypeName":"Test3"}

Below is the USQL script, I'm trying to extract the data from above file.

    REFERENCE ASSEMBLY [Newtonsoft.Json];
    REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

DECLARE @Full_Path string =
"adl://xxxx.azuredatalakestore.net/2017/03/28/00_0_66ffdd26541742fab57139e95080e704.json";

DECLARE @Output_Path = "adl://xxxx.azuredatalakestore.net/Output/Output.csv";

@logSchema =
EXTRACT dimBetType_SKey int
FROM @Full_Path
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

OUTPUT @logSchema
TO @Output_Path 
USING Outputters.Csv();

But the USQL is keep failing with Vertex error

Any help?

like image 361
Saz Avatar asked Mar 30 '17 09:03

Saz


1 Answers

This is probably because you have new JSON blocks on each new line of the file. This means you need to parse it slightly differently rather than in being a straight JSON file.

Try just using a text extractor first to bring in each JSON element with a new line delimiter. Like this...

DECLARE @Full_Path string = "etc"

@RawExtract = 
    EXTRACT 
        [RawString] string, 
        [FileName] string //optional, see below
    FROM
        @Full_Path
    USING 
        Extractors.Text(delimiter:'\b', quoting : false);

Then shred the JSON with the assembly you've referenced, but using the JSON tuple method. Like this...

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

@ParsedJSONLines = 
    SELECT 
        Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple([RawString]) AS JSONLine,
        [FileName]
    FROM 
        @RawExtract

Next, get the values out. Like this...

@StagedData =
    SELECT 
        JSONLine["dimBetType_SKey"] AS dimBetType_SKey,
        JSONLine["BetType_BKey"] AS BetType_BKey,
        JSONLine["BetTypeName"] AS BetTypeName
        [FileName]
    FROM 
        @ParsedJSONLines;

Finally, do your output to CSV, or whatever.

DECLARE @Output_Path string = "etc"

OUTPUT @StagedData
TO @Output_Path 
USING Outputters.Csv();

As a side note, you don't need to reference the complete data lake store path. The analytics engine knows where the root to the storage is so you can probably replace your variables with just this...

DECLARE @Full_Path string = "/2017/03/28/{FileName}";

Hope this helps sort your issue.

like image 173
Paul Andrew Avatar answered Oct 08 '22 00:10

Paul Andrew