Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL style JOIN on JSON data

Is there any way efficiently to join JSON data? Suppose we have two JSON datasets:

{"COLORS":[[1,red],[2,yellow],[3,orange]]}

{"FRUITS":[[1,apple],[2,banana],[3,orange]]}

And I want to turn this into the following client side:

{"NEW_FRUITS":[[1,apple,red],[2,banana,yellow],[3,orange,orange]]}

Keep in mind there will be thousands of records here with much more complex data structures. jQuery and vanilla javascript are both fine. Also keep in mind that there may be colors without fruits and fruits without colors.

NOTE: For the sake of simplicity, let's say that the two datasets are both in the same order, but the second dataset may have gaps.

like image 869
Luke The Obscure Avatar asked Aug 31 '11 00:08

Luke The Obscure


People also ask

Can you use SQL on JSON?

SQL Server and Azure SQL Database have native JSON functions that enable you to parse JSON documents using standard SQL language. You can store JSON documents in SQL Server or SQL Database and query JSON data as in a NoSQL database.

How do I query JSON data in SQL?

To query JSON data, you can use standard T-SQL. If you must create a query or report on JSON data, you can easily convert JSON data to rows and columns by calling the OPENJSON rowset function. For more information, see Convert JSON Data to Rows and Columns with OPENJSON (SQL Server).

How do I query in JSON?

You can query JSON data using a simple dot notation or, for more functionality, using SQL/JSON functions and conditions. You can create and query a data guide that summarizes the structure and type information of a set of JSON documents.

Is it OK to use JSON as a database?

JSON document databases are a good solution for online profiles in which different users provide different types of information. Using a JSON document database, you can store each user's profile efficiently by storing only the attributes that are specific to each user.


2 Answers

Alasql JavaScript SQL library does exactly what you need in one line:

 <script src="alasql.min.js"></script>
 <script>
    var data = { COLORS: [[1,"red"],[2,"yellow"],[3,"orange"]],            
                 FRUITS: [[1,"apple"],[2,"banana"],[3,"orange"]]};

    data.NEW_FRUITS = alasql('SELECT MATRIX COLORS.[0], COLORS.[1], FRUITS.[1] AS [2] \
         FROM ? AS COLORS JOIN ? AS FRUITS ON COLORS.[0] = FRUITS.[0]',
         [data.COLORS, data.FRUITS]);
 </script>

You can play with this example in jsFiddle.

This is a SQL expression, where:

  • SELECT - select operator
  • MATRIX - modifier, whci converts resultset from array of objects to array of arrays
  • COLORS.[0] - first column of COLORS array, etc.
  • FRUITS.1 AS 2 - the second column of array FRUITS will be stored as third column in resulting recordset
  • FROM ? AS COLORS - data array from parameters named COLORS in SQL statement
  • JOIN ? ON ... - join
  • [data.COLORS, data.FRUITS] - parameters with data arrays
like image 184
agershun Avatar answered Sep 20 '22 12:09

agershun


The fact that there will be thousands of inputs and the keys are not necessarily ordered means your best bet (at least for large objects) is to sort by key first. For objects of size less than about 5 or so, a brute-force n^2 approach should suffice.

Then you can write out the result by walking through the two arrays in parallel, appending new "records" to your output as you go. This sort-then-merge idea is a relatively powerful one and is used frequently. If you do not want to sort first, you can add elements to a priority queue, merging as you go. The sort-then-merge approach is conceptually simpler to code perhaps; if performance matters you should do some profiling.

For colors-without-fruits and fruits-without-colors, I assume writing null for the missing value is sufficient. If the same key appears more than once in either color or fruit, you can either choose one arbitrarily, or throw an exception.

ADDENDUM I did a fiddle as well: http://jsfiddle.net/LuLMz/. It makes no assumptions on the order of the keys nor any assumptions on the relative lengths of the arrays. The only assumptions are the names of the fields and the fact that each subarray has two elements.

like image 38
Ray Toal Avatar answered Sep 19 '22 12:09

Ray Toal