Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Coloring intervals of missing values with Gnuplot

Tags:

gnuplot

I have temporal data, where some time intervals contain only missing values. I want to show explicitely those missing values intervals.

For now, the solution I have is to check whether the value is NaN or not, as such:

plot file_name using 1:(stringcolumn(num_column) eq "NaN" ? 1/0 : column(num_column)) with lines,\
    "" using 1:(stringcolumn(num_column) eq "NaN" ? 1000 : 1/0) with points

Which will result in drawing points at y = 1000 instead of the line for missing values, which gives the following result:

enter image description here

However, this is not ideal because a) I need to specify a y value at which to draw the points and b) it's quite ugly, especially when the dataset is longer in time.

I would like to produce something like this instead:

enter image description here

That is, to fill completely this interval with a color (possibly with some transparency unlike my image). Note that in these examples there is only one interval of missing values, bu in reality there can be any number of them on one plot.

like image 988
Fatalize Avatar asked Feb 23 '16 15:02

Fatalize


2 Answers

We can do some pre-processing to accomplish this. Suppose that we have the following data file, data.txt

1 8
2 6
4 NaN
5 NaN
6 NaN
7 9
8 10
9 NaN
10 NaN
11 6
12 11

and the following python 3 program (obviously, using python is not the only way to do this), process.py1

data = [x.strip().split() for x in open("data.txt","r")]
i = 0
while i<len(data):
    if (data[i][1]=="NaN"):
        print(data[i-1][0],end=" ") # or use data[i][0]
        i+=1
        while data[i][1]=="NaN": i+=1
        print(data[i][0],end=" ") # or use data[i-1][0]
    else: i+=1

This python program will read the data file, and for each range of NaN values, it will output the last good and next good x-coordinates. In the case of the example data file, it outputs 2 7 8 11 which can be used as bounds for drawing rectangles. Now we can do, in gnuplot2

breaks = system("process.py")
set for [i=0:words(breaks)/2-1] object (i+1) rectangle from word(breaks,2*i+1),graph 0 to word(breaks,2*i+2),graph 1 fillstyle solid noborder fc rgb "orange"

Which will draw filled rectangles over this range. It determines how many "blocks" (groups of two values) are in the breaks variable then reads these two at a time using the breaks as left and right bounds for rectangles.

Finally, plotting the data

plot "data.txt" u 1:2 with lines

produces

enter image description here

which shows the filled rectangles over the range of NaN values.


Just to provide a little more applicability, the following awk program, process.awk3 serves the same purpose as the above python program, if awk is available and python isn't:

BEGIN {
    started = 0;
    last = "";
    vals = "";
}

($2=="NaN") {
    if (started==0) {
        vals = vals " " last;
        started = 1;
    }
}

($2!="NaN") {
    last = $1
    if (started==1) {
        vals = vals " " last;
        started = 0;
    }
}

END {
    sub(/^ /,"",vals);
    print vals;
}

We can use this by replacing the system call above with

breaks = system("awk -f process.awk data.txt")


1 The boundaries are extended to the last and next point to completely fill the gap. If this is not desired, the commented values will cover only the region identified by NaN in the file (4-6 and 8-10 in the example case). The program will not handle NaN values as the first or last data point.

2 I used solid orange for the gaps. Feel free to use any color spec there.

3 The awk program extends the boundaries in the same way as the python program, but takes more modification to get the other behavior. It has the same limitations in not handling NaN values as the first or last data point.

like image 191
Matthew Avatar answered Oct 08 '22 03:10

Matthew


Using two filled curves

A somewhat "hacky" way of doing it is using two filled curves, as such:

plot file_name using 1:(stringcolumn(num_column) eq "NaN" ? 1/0 : column(num_column)) with lines ls 2,\
    "" using 1:(stringcolumn(num_column) eq "NaN" ? 0 : 1/0) with filledcurve x1 ls 3,\
    "" using 1:(stringcolumn(num_column) eq "NaN" ? 0 : 1/0) with filledcurve x2 ls 3

Both filledcurve must have the same linestyle, so that we get one uniform rectangle.

One filledcurve has x1 as parameter and the other x2, so that one fills above 0 and the other below 0.

You can remove the curve at 0 and make the filling transparent using this:

set style fill transparent solid 0.8 noborder

This is the result:

enter image description here

Note that the dashed line at 0 under the rectangle is a bit glitchy compared to the other dashed lines. Note also that if some rectangles are very small in width, they will look lighter than expected.

like image 29
Fatalize Avatar answered Oct 08 '22 01:10

Fatalize