Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a matrix with SQL

With geographic data records like this:

START                  |  END

CITY1    |   STATE1    |   CITY2    |  STATE2
----------------------------------------------
New York |    NY       |  Boston    |   MA
Newark   |    NJ       |  Albany    |   NY
Cleveland|    OH       |  Cambridge |   MA

I would like to output something like this where it counts START/END pairings displayed as a matrix:

   |  MA  |  NJ  |  NY  |  OH
------------------------------
MA |  0   |  0   |  1   |  0
NJ |  0   |  0   |  1   |  0
NY |  1   |  0   |  0   |  0
OH |  1   |  0   |  0   |  0

I can see how GROUP BY and COUNT will find the data but I'm lost on how to display as a matrix. Does anyone have any ideas?

like image 742
greener Avatar asked Oct 08 '22 06:10

greener


People also ask

Can you create an array in SQL?

Conclusion. As you can see, SQL Server does not include arrays. But we can use table variables, temporary tables or the STRING_SPLIT function. However, the STRING_SPLIT function is new and can be used only on SQL Server 2016 or later versions.

What is a Matrix in database?

Whereas an array is merely a data structure who elements are accessed by a numeric value called an index, a matrix is an array with mathematical operations defined on it. A matrix can be one, two, three or more dimensional structures.


1 Answers

This seems to do the trick, tested on PostgreSQL 9.1. It will almost certainly need to be adapted for SQL Server (anyone feel free to update my answer to that effect).

SELECT start AS state,
    SUM((dest = 'MA')::INT) AS MA,
    SUM((dest = 'NJ')::INT) AS NJ,
    SUM((dest = 'NY')::INT) AS NY,
    SUM((dest = 'OH')::INT) AS OH
FROM (
    SELECT state1 AS start, state2 AS dest
        FROM routes
    UNION ALL
    SELECT state2 AS start, state1 AS dest
        FROM routes
) AS s
GROUP BY start
ORDER BY start;

However note that my output is slightly different than yours--I'm not sure if that's because your sample output is wrong, or because I misunderstood your requirements:

 state | ma | nj | ny | oh 
-------+----+----+----+----
 MA    |  0 |  0 |  1 |  1
 NJ    |  0 |  0 |  1 |  0
 NY    |  1 |  1 |  0 |  0
 OH    |  1 |  0 |  0 |  0
(4 rows)

This query works by querying the table twice, once for the state1 -> state2 routes, and a second time for the state2 -> state1 routes, then joins them together with UNION ALL.

Then for each destination state, it runs a SUM() for that row's origin state.

This strategy should be easy to adapt for any RDBMS.

like image 156
Flimzy Avatar answered Oct 18 '22 07:10

Flimzy