Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert JSON strings to normalized schema in SQL Server

Tags:

I have a table in an Azure SQL Server database which is just a single nvarchar(max) column with JSON strings in it like the one below, one per row flattened to a single line (about 200,000 rows)

{
    "username": "George84",
    "items": [{
            "type": 8625,
            "score": "90"
        }, {
            "type": 8418,
            "score": "84"
        }, {
            "type": 7818,
            "score": "90"
        }, {
            "type": 23,
            "score": "q"
        }
    ]
}

Each record has a single username and an array of items with one or more entries. Each entry in the items array contains one type field (integer) and one score field (string). Usernames are unique. Multiple users may have the same type of item, and these duplicates may or may not have the same score.

I want to convert this to a new set of properly normalized tables with a schema like the one below:

Schema Diagram

I know I could do it with an external application, but I'm hoping to take advantage of SQL Server 2016's new JSON processing capability to do it entirely within the database/TSQL.

What is the best way to accomplish this conversion?

like image 497
Techrocket9 Avatar asked Jun 29 '17 19:06

Techrocket9


1 Answers

This is the first time for me to work with Json and T-sql but I think its fun, so I think there may be a lot of better solutions, anyhow lets start.

first I would need some data to start with so I will generate some temp data as below:-

declare @jsonData Table (jsonText nvarchar(max))

insert into @jsonData(jsonText)values(N'{
    "username": "George84",
    "items": [{
            "type": 8625,
            "score": "90"
        }, {
            "type": 8418,
            "score": "84"
        }, {
            "type": 7818,
            "score": "90"
        }, {
            "type": 23,
            "score": "q"
        }
    ]
}'),(N'{
    "username": "George85",
    "items": [{
            "type": 8625,
            "score": "80"
        }, {
            "type": 8418,
            "score": "90"
        }, {
            "type": 7818,
            "score": "70"
        }, {
            "type": 232,
            "score": "q"
        }
    ]
}'),(N'{
    "username": "George86",
    "items": [{
            "type": 8626,
            "score": "80"
        }, {
            "type": 8418,
            "score": "70"
        }, {
            "type": 7818,
            "score": "90"
        }, {
            "type": 23,
            "score": "q"
        }
    ]
}'),(N'{
    "username": "George87",
    "items": [{
            "type": 8625,
            "score": "90"
        }, {
            "type": 8418,
            "score": "70"
        }, {
            "type": 7818,
            "score": "60"
        }, {
            "type": 23,
            "score": "q"
        }
    ]
}')

This would give me a table that simulate your main data table.

Lets create some memory tables to store the data as below:

declare @Users Table (ID int not null IDENTITY(1, 1),username nvarchar(50))
declare @Types Table (ID int not null IDENTITY(1, 1),[type] int)
declare @Scores Table (ID int not null IDENTITY(1, 1),score nvarchar(50))
declare @Items Table (ID int not null IDENTITY(1, 1),UserId int,TypeId int,ScoreId int)

You may have the above tables already created so you can replace them. so lets jump to the t-sql that will fill the tables

declare @RowsCount int=(select count(*) from @jsonData)
declare @index int=1
declare @jsonRowData NVARCHAR(MAX)
declare @username NVARCHAR(50)

while(@index<=@RowsCount)
begin
    ;with JsonDataWithSeq as (
        select ROW_NUMBER() over(order by jsonText) [seq],* from @jsonData
    ) select top(1) @jsonRowData=JsonDataWithSeq.jsonText from JsonDataWithSeq where seq=@index

    --select @jsonRowData [jsonRowData],ISJSON(@jsonRowData)  [ISJSON]

    SELECT @username=JSON_VALUE(@jsonRowData, '$.username') 

    if not exists (select * from @Users where username=@username)--no need for this check if names are unique in the list
    insert into @Users (username) values(@username)

    insert into @Types([type])
    SELECT xx.[type] from OPENJSON(@jsonRowData, 'lax $.items') with ([type] int) xx where xx.[type] not in (select t.[type] From @Types t)

    insert into @Scores([score])
    SELECT xx.[score] from OPENJSON(@jsonRowData, 'lax $.items') with ([score] nvarchar(50)) xx where xx.[score] not in (select t.[score] From @Scores t)

    insert into @Items(UserId,TypeId,ScoreId)
    SELECT u.ID [UserID],t.ID [TypeID],s.ID [ScoreID]  from OPENJSON(@jsonRowData, 'lax $.items') with ([type] int,[score] nvarchar(50)) xx
    inner join @Users u on u.username=@username
    inner join @Types t on t.[type]=xx.[type]
    inner join @Scores s on s.score=xx.score

    set @index=@index+1
end

select * from @Users
select * from @Types
select * from @Scores
select * from @Items

And thats it, hope this helps .

For the data I provided I got the following results:-

Users Table:-

ID  username
==  ========
1   George84
2   George85
3   George86
4   George87

Types Table:-

ID  type
==  =====
1   8625
2   8418
3   7818
4   23
5   232
6   8626

Scores Table:-

ID  score
==  ======
1   90
2   84
3   90
4   q
5   80
6   70
7   60

Items Table:-

ID      UserId  TypeId  ScoreId
==      ====== =======  ========
1       1       1       1
2       1       1       3
3       1       2       2
4       1       3       1
5       1       3       3
6       1       4       4
7       2       1       5
8       2       2       1
9       2       2       3
10      2       3       6
11      2       5       4
12      3       6       5
13      3       2       6
14      3       3       1
15      3       3       3
16      3       4       4
17      4       1       1
18      4       1       3
19      4       2       6
20      4       3       7
21      4       4       4
like image 110
Ali Al-Mosawi Avatar answered Sep 20 '22 13:09

Ali Al-Mosawi