Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sankey Diagram in R with networkD3 - row number issues

sankey diagram

I'd like to focus on the flow highlighted above connecting the blue 'Thermal generation' block to the pink 'Electricity grid' block. You'll notice that the flow is 526 TWh, which is row #62 from Energy$links.

Energy$links
   source target   value
...
62     26     15 525.531
...

Now let's focus on the source and target values which refer to nodes in Energy$nodes.

Energy$nodes
                             name
...
15        Heating and cooling - homes
16                   Electricity grid
...
26                       Gas reserves
27                 Thermal generation
...

The source value is '26' when it actually refers to row '27' of the nodes data. The target value is '15' when it actually refers to row '16' of the nodes data. Why do the source and target values in the links data actually refer to row x - 1 instead of x in the nodes data? Is there any way around this other than performing the x - 1 calculation in my head when building these Sankey Diagrams?

Here's the full Energy data:

> Energy
$`nodes`
                                 name
1                Agricultural 'waste'
2                      Bio-conversion
3                              Liquid
4                              Losses
5                               Solid
6                                 Gas
7                     Biofuel imports
8                     Biomass imports
9                        Coal imports
10                               Coal
11                      Coal reserves
12                   District heating
13                           Industry
14   Heating and cooling - commercial
15        Heating and cooling - homes
16                   Electricity grid
17          Over generation / exports
18                      H2 conversion
19                     Road transport
20                        Agriculture
21                     Rail transport
22 Lighting & appliances - commercial
23      Lighting & appliances - homes
24                        Gas imports
25                               Ngas
26                       Gas reserves
27                 Thermal generation
28                         Geothermal
29                                 H2
30                              Hydro
31             International shipping
32                  Domestic aviation
33             International aviation
34                National navigation
35                       Marine algae
36                            Nuclear
37                        Oil imports
38                                Oil
39                       Oil reserves
40                        Other waste
41                        Pumped heat
42                           Solar PV
43                      Solar Thermal
44                              Solar
45                              Tidal
46            UK land based bioenergy
47                               Wave
48                               Wind

$links
   source target   value
1       0      1 124.729
2       1      2   0.597
3       1      3  26.862
4       1      4 280.322
5       1      5  81.144
6       6      2  35.000
7       7      4  35.000
8       8      9  11.606
9      10      9  63.965
10      9      4  75.571
11     11     12  10.639
12     11     13  22.505
13     11     14  46.184
14     15     16 104.453
15     15     14 113.726
16     15     17  27.140
17     15     12 342.165
18     15     18  37.797
19     15     19   4.412
20     15     13  40.858
21     15      3  56.691
22     15     20   7.863
23     15     21  90.008
24     15     22  93.494
25     23     24  40.719
26     25     24  82.233
27      5     13   0.129
28      5      3   1.401
29      5     26 151.891
30      5     19   2.096
31      5     12  48.580
32     27     15   7.013
33     17     28  20.897
34     17      3   6.242
35     28     18  20.897
36     29     15   6.995
37      2     12 121.066
38      2     30 128.690
39      2     18 135.835
40      2     31  14.458
41      2     32 206.267
42      2     19   3.640
43      2     33  33.218
44      2     20   4.413
45     34      1   4.375
46     24      5 122.952
47     35     26 839.978
48     36     37 504.287
49     38     37 107.703
50     37      2 611.990
51     39      4  56.587
52     39      1  77.810
53     40     14 193.026
54     40     13  70.672
55     41     15  59.901
56     42     14  19.263
57     43     42  19.263
58     43     41  59.901
59      4     19   0.882
60      4     26 400.120
61      4     12  46.477
62     26     15 525.531  # the highlighted 'flow'
63     26      3 787.129
64     26     11  79.329
65     44     15   9.452
66     45      1 182.010
67     46     15  19.013
68     47     15 289.366
like image 211
Display name Avatar asked Feb 19 '19 15:02

Display name


People also ask

How do I make a Sankey diagram in R?

In R, the networkD3 package is the best way to build them The networkD3 package allows to visualize networks using several kinds of viz. One of its function makes stunning Sankey diagrams as shown below. Follow the steps below to get the basics and learn how to customize your Sankey Diagram.

What are row and column names in the Sankey diagram?

Row and column names are node names. The item in row x and column y represents the flow between x and y. In the Sankey diagram we represent all flows that are over 0. Since the networkD3 library expects a connection data frame, we will fist convert the dataset, and then re-use the code from above.

What is the incidence matrix in the Sankey diagram?

An incidence matrix is square or rectangle. Row and column names are node names. The item in row x and column y represents the flow between x and y. In the Sankey diagram we represent all flows that are over 0. Since the networkD3 library expects a connection data frame, we will fist convert the dataset, and then re-use the code from above.

Is it possible to customize the Sankey diagram using networkd3?

That being said, networkD3 is not designed to facilitate that level of customization. In order to achieve that, one would have to heavily modify the underlying JavaScript that is included in the package. Thanks cjyetman, what other packages could you recommend for the sankey diagram?


1 Answers

The reason is that ultimately the data gets sent to JavaScript/D3, which uses 0-based indexing... which means the index of the first element of a vector/array/etc. is 0... unlike in R where the index of the first element of a vector is 1.


as an example of easily converting R-style data...

source <- c("A", "A", "B", "C", "D", "D", "E", "E")
target <- c("D", "E", "E", "D", "H", "I", "I", "H")

nodes <- data.frame(name = unique(c(source, target)))

links <- data.frame(source = match(source, nodes$name) - 1,
                    target = match(target, nodes$name) - 1,
                    value = 1)

library(networkD3)
sankeyNetwork(links, nodes, "source", "target", "value", "name")

enter image description here

like image 128
CJ Yetman Avatar answered Sep 27 '22 21:09

CJ Yetman