Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better way to visualize complicated data

I'm using the below data to create a plot in R using ggplot2.

 Hour.of.day     Model  N Distance.travelled       sd        se        ci
1         0100 h300_fv30 60          3.6264709 5.078277 0.6556027 1.3118579
2         0100 h300_fv35 60          2.9746019 5.313252 0.6859379 1.3725586
3         0100 h300_fv40 60          3.0422525 3.950650 0.5100267 1.0205610
4         0200 h300_fv30 60          4.3323896 6.866003 0.8863972 1.7736767
5         0200 h300_fv35 60          3.5567420 6.259378 0.8080823 1.6169689
6         0200 h300_fv40 60          2.5232512 4.533234 0.5852380 1.1710585
7         0300 h300_fv30 60          3.1800537 5.303506 0.6846797 1.3700409
8         0300 h300_fv35 60          2.9281442 4.445953 0.5739700 1.1485113
9         0300 h300_fv40 60          2.5078045 4.058295 0.5239236 1.0483687
10        0400 h300_fv30 60          3.3408231 4.567161 0.5896180 1.1798229
11        0400 h300_fv35 60          2.8679676 5.396700 0.6967110 1.3941155
12        0400 h300_fv40 60          3.1615813 4.244155 0.5479180 1.0963815
13        0500 h300_fv30 60          3.8117851 6.970900 0.8999394 1.8007745
14        0500 h300_fv35 60          2.1130581 3.925906 0.5068323 1.0141691
15        0500 h300_fv40 60          3.6430531 4.905484 0.6332953 1.2672209
16        0600 h300_fv30 60          3.5234762 5.150027 0.6648657 1.3303931
17        0600 h300_fv35 60          2.0341804 3.192176 0.4121082 0.8246266
18        0600 h300_fv40 60          3.2838958 3.770624 0.4867855 0.9740555
19        0700 h300_fv30 60          3.8327926 6.521022 0.8418603 1.6845587
20        0700 h300_fv35 60          1.6933289 2.607322 0.3366039 0.6735428
21        0700 h300_fv40 60          2.3896956 3.435656 0.4435413 0.8875241
22        0800 h300_fv30 60          3.3077466 6.504371 0.8397107 1.6802573
23        0800 h300_fv35 60          1.4823307 3.556884 0.4591917 0.9188405
24        0800 h300_fv40 60          2.4161741 3.571444 0.4610715 0.9226019
25        0900 h300_fv30 60          2.1506438 2.893029 0.3734885 0.7473487
26        0900 h300_fv35 60          1.8821961 3.457929 0.4464167 0.8932778
27        0900 h300_fv40 60          1.7896335 2.714514 0.3504423 0.7012334
28        1000 h300_fv30 60          2.5107475 5.491835 0.7089929 1.4186914
29        1000 h300_fv35 60          0.9491365 2.061712 0.2661658 0.5325966
30        1000 h300_fv40 60          1.6678013 3.234033 0.4175119 0.8354393
31        1100 h300_fv30 60          1.8602186 3.365695 0.4345093 0.8694511
32        1100 h300_fv35 60          1.4385708 2.869765 0.3704851 0.7413389
33        1100 h300_fv40 60          1.1273899 2.010280 0.2595261 0.5193105
34        1200 h300_fv30 60          1.4870763 2.112841 0.2727667 0.5458048
35        1200 h300_fv35 60          2.5295481 4.740384 0.6119810 1.2245711
36        1200 h300_fv40 60          1.6551202 3.051420 0.3939366 0.7882653
37        1300 h300_fv30 60          2.8791490 4.925870 0.6359271 1.2724872
38        1300 h300_fv35 60          2.4731563 5.266690 0.6799268 1.3605303
39        1300 h300_fv40 60          4.5989133 8.394460 1.0837201 2.1685189
40        1400 h300_fv30 60          1.5050205 3.188480 0.4116310 0.8236717
41        1400 h300_fv35 60          1.7615688 3.064842 0.3956693 0.7917325
42        1400 h300_fv40 60          2.2766514 5.215937 0.6733746 1.3474194
43        1500 h300_fv30 60          1.9097882 2.770040 0.3576106 0.7155772
44        1500 h300_fv35 60          2.0109347 4.070014 0.5254365 1.0513961
45        1500 h300_fv40 60          1.6316881 4.119681 0.5318485 1.0642264
46        1600 h300_fv30 60          3.3246263 5.352698 0.6910304 1.3827486
47        1600 h300_fv35 60          2.0389703 3.781869 0.4882372 0.9769604
48        1600 h300_fv40 60          1.0204568 2.205685 0.2847527 0.5697888
49        1700 h300_fv30 60          3.6132519 5.467875 0.7058996 1.4125019
50        1700 h300_fv35 60          2.1139255 4.178283 0.5394140 1.0793648
51        1700 h300_fv40 60          1.5547818 3.411135 0.4403756 0.8811895
52        1800 h300_fv30 60          5.0552532 7.344069 0.9481152 1.8971742
53        1800 h300_fv35 60          2.1832792 3.824244 0.4937078 0.9879070
54        1800 h300_fv40 60          1.6532516 3.273697 0.4226325 0.8456856
55        1900 h300_fv30 60          5.6107731 6.891023 0.8896272 1.7801399
56        1900 h300_fv35 60          2.9822004 5.958244 0.7692060 1.5391777
57        1900 h300_fv40 60          2.7111394 3.798765 0.4904184 0.9813250
58        2000 h300_fv30 60          6.0438385 7.126952 0.9200855 1.8410868
59        2000 h300_fv35 60          3.9517888 6.462761 0.8343388 1.6695081
60        2000 h300_fv40 60          3.9508503 5.374253 0.6938130 1.3883167
61        2100 h300_fv30 60          4.2144712 5.648673 0.7292406 1.4592070
62        2100 h300_fv35 60          2.2205186 3.397391 0.4386013 0.8776392
63        2100 h300_fv40 60          3.9000010 5.881409 0.7592866 1.5193290
64        2200 h300_fv30 60          3.9478958 5.584154 0.7209112 1.4425401
65        2200 h300_fv35 60          3.1612149 4.788883 0.6182421 1.2370996
66        2200 h300_fv40 60          3.7812992 6.424478 0.8293965 1.6596186
67        2300 h300_fv30 61          3.3860628 5.176299 0.6627571 1.3257117
68        2300 h300_fv35 61          3.7427743 6.257596 0.8012031 1.6026448
69        2300 h300_fv40 61          3.6674335 4.945831 0.6332487 1.2666861
70        2400 h300_fv30 59          3.8745470 5.763821 0.7503856 1.5020600
71        2400 h300_fv35 59          3.1284346 5.016476 0.6530895 1.3073007
72        2400 h300_fv40 59          3.7563017 4.819053 0.6273872 1.2558520

The plot function is

ggplot(my_data, aes(x=Hour.of.day, y=Distance.travelled, colour=Model)) + 
    geom_errorbar(aes(ymin = Distance.travelled - ci, ymax = Distance.travelled + ci), width=.1, position=position_dodge(2)) + 
    geom_line(position=position_dodge(2)) + 
    geom_point(position=position_dodge(2)) + 
    scale_x_discrete(breaks=c("0600", "1200", "1800", "2400")) + 
    theme(axis.ticks = element_blank())

Differentiating the three separate patterns is hard to do in the resulting plot. enter image description here

Does anybody have any suggestions on ways to improve the visualization so that the three separate patterns can be better differentiated? For example, some way to emphasize the mean points and place the confidence intervals in the background?

like image 879
user2359494 Avatar asked Jan 03 '14 16:01

user2359494


People also ask

How do you visualization helps in understanding complex data?

Data visualization is the practice of translating information into a visual context, such as a map or graph, to make data easier for the human brain to understand and pull insights from. The main goal of data visualization is to make it easier to identify patterns, trends and outliers in large data sets.


1 Answers

Use lines and ribbons:

library(ggplot2)
ggplot(my_data, aes(x=Hour.of.day, y=Distance.travelled,
                     fill=Model)) +
    theme_bw()+
    geom_line(aes(colour=Model))+
    geom_ribbon(aes(ymin = Distance.travelled - ci,
                    ymax = Distance.travelled + ci),alpha=0.4)+
    scale_x_discrete(breaks=c("0600", "1200", "1800", "2400")) + 
    theme(axis.ticks = element_blank())
ggsave("ribbonplot.png",width=7,height=4)

enter image description here

You can make the lines wider (lwd) or the ribbons fainter (alpha) if you want to emphasize the pattern of the mean more strongly.

like image 79
Ben Bolker Avatar answered Sep 19 '22 22:09

Ben Bolker