Plotting 1D binary array (uint8) with multiple records in gnuplot

Question

I understand this question is similar to Gnuplot: How to plot multiple time series from a binary format; however I've already set up an example which is slightly different, so hope it's OK to post (self-answer follows).

I'm generating my binary data like this (see below for the genbindata.pl Perl script):

$ perl genbindata.pl > bin.dat
$ du -b bin.dat 
234 bin.dat

This binary file, bin.dat is formatted like this (first two rows are 1-based and 0-based index):

>|  1   2   3   4 |  5   6 ... 104 |105 106 ... 204 |205 206 ... 234
>|000 001 002 003 |004 005 ... 103 |104 105 ... 203 |204 205 ... 233
>| WW  WW  WW  WW | XX  XX ...  XX | YY  YY ...  YY | ZZ  ZZ ...  ZZ

... where WW are 4 bytes of signature; XX are 100 bytes of a sinusoid with values from 0 to 63; YY are 100 bytes of a cosine with values from 64 to 127; and ZZ are 30 bytes of random value; here considering a byte to be uint8.

What I want to do, is to use this bin.dat as-it-is (that is, I wouldn't like to write scripts to parse the data, and output it in a more gnuplot friendly format) - and plot the sine and cosine data, with separate color, on a single diagram.

I found Binary general section of the help (same in gnuplot terminal by typing help binary general), but have difficulties understanding it (and couldn't find much more other info online). So - after scavenging for (the few) examples online - I fire up gnuplot in terminal mode, and I'm trying the following gnuplot command:

plot "bin.dat" binary skip=4 array=100x1:100x1 format='%uint8%uint8' origin=(0,0):(100,0) using 0:1 with lines

... in hope that it means: "Skip the first four bytes, interpret the next as 100 bytes of 1D data (formatted as '%uint8', and origin at 0,0 after skip), followed by 100 bytes of 1D data (100 rows of one column, formatted as '%uint8', and origin at 100,0 after skip); and use pseudocolumn 0 (index of point) as x axis, and the first result from the arrays, to plot with lines"... unfortunately, it doesn't mean that - as nothing is plotted, and the command fails with "Too many using specs for this style".

Then I think - ok, if there is "too many using", then I'll just plot the 1 there:

    gnuplot> plot "bin.dat" binary skip=4 array=100x1:100x1 format='%uint8%uint8' origin=(0,0):(100,0) using 1 with lines
    Warning: empty y range [0:0], adjusting to [-1:1]

This does in fact generate a plot - a single flat red line at the y=0.

So, given it complains about y range, I change the order of the origin arguments ((100,0) to (0,100)), and finally get a command that doesn't generate any message:

gnuplot> plot "bin.dat" binary skip=4 array=100x1:100x1 format='%uchar%uchar' origin=(0,0):(0,100) using 1 with lines
gnuplot>

... but it plots just a single tilted line:

gnuplot-fail-1

... nothing like the sinusoid I expect :(

So, my question is - how can I get gnuplot to plot the data that I want?

Here is genbindata.pl:

#!/usr/bin/env perl

use 5.10.1;
use warnings;
use strict;
use open IO => ':raw'; 

binmode(STDIN);
binmode(STDOUT);

my $signatur = "SIGN";
my @signature = unpack('C*', $signatur);

my (@ch1, @ch2) = ()x2;

# generate 100 samples of (co)sinusoid
for ( my $ix = 0; $ix < 100; $ix++ ) {
  my $val1 = 1 + sin($ix*2*3.14/100); # range: 0-2
  my $val2 = 1 + cos($ix*2*3.14/100); # range: 0-2
  my $ch1val = int($val1*32);
  my $ch2val = int($val2*32+64);
  push(@ch1, $ch1val);
  push(@ch2, $ch2val);
  #print STDERR "val[$ix]: $ch1val, $ch2val
";
}

# generate 30 samples random
my @end = ();
for ( my $ix = 0; $ix < 30; $ix++ ) {
  my $val = int(128*rand() + 32);
  push(@end, $val);
  #~ print STDERR "val[$ix]: $val
";
}

# concatenate arrays:
my @output = (@signature,@ch1,@ch2,@end);
my $sizarr = scalar(@output);
#~ print STDERR " ".." ";

# print output - uint8: "C"
my $outstr = pack("C*", @output);
my $lenstr = length($outstr);
#~ print STDERR "output size: $sizarr; output length: $lenstr
";
print $outstr;

# end

sdaau · Accepted Answer

Well, there are some misconceptions in the question above; I'm using gnuplot 4.4 for this. First of, note how I've thought that the "100x1" in "array=100x1:100x1" indicates dataset of 100 rows and 1 column - which would be identical to 1D array (the index being implied). However, that is not so; note the help binary array says:

Note: Gnuplot version 4.2 used the syntax array=128x128 rather than
array=(128,128). The older syntax is now deprecated, but may
still work [...]

The coordinates will be generated by gnuplot. A number must be specified
for each dimension of the array. For example, array=(10,20) means the
underlying sampling structure is two-dimensional with 10 points along the
first (x) dimension [..]

Thus, even if "100x1" conceptually may be identical to 1D array - just by writing it like that, I've indicated a 2D data to gnuplot (e.g. an image; which, I'd expect, is instantiated differently than a 1D array of 100 elements, internally in gnuplot). So, instead I should have written "array=(100):(100)". Note that parenthesis can in principle be dropped for 1D, so "array=100:100" would have been OK too - except in the case of using -1 (read until the end of file) as dimension; then a parenthesis has to be used, else error occurs.

Then, there is the problem of multiple records. I could find very few references to these "multiple records" in gnuplot - there was Binary syntax reduction [Was: Lengthy discussion...] (gnuplot.devel):

How about tossing out the multiple records per file feature. If there is more than one big data set to plot, just create multiple files.

I, for one, am glad they kept the multiple records feature - but I sure wish it was explained better. Another comment I found in Gnuplot, Plot with sum of datasets:

... gnuplot (3.6, that is) can plot combinations of data on a single record, but cannot combine data from multiple records, much less multiple datasets.

You'll have to write a preprocessor that does these calculations.

The help binary array says just:

A colon can be used to separate the dimensions for multiple records.
For example, array=25:35 indicates there are two one-dimensional records in
the file.

This made me think that if I specify "array=(100):(100)" as two (multiple) records, I could "access" these records in a using 0:1 and using 0:2 statements respectively (using the pseudocolumn 0 to index their data). Turns out, that is not the case - it seems to me that the only facility such a specification of multiple records allows for, is the ability to control them via origin and/or skip parameters, but only within a single plot.

Talking of records, note also there is help binary record:

This keyword serves the same function as array, having the same syntax. However, record causes gnuplot to not generate coordinate information. This is for the case where such information may be included in one of the columns of the binary data file.

This doesn't tell me much - but I think it means that, in principle, one should specify pseudocolumn 0 to index the data if using record; but should not do that for array. Thus, the following two commands are equivalent:

plot "bin.dat" binary array=(50) format='%uint8' using 1 with lines plot "bin.dat" binary record=(50) format='%uint8' using 0:1 with lines

... which gets the first 50 samples as the first (and only) record of 1-D data, with its (only) column of data formatted as uint8 - and plots it:

gnuplot-50-samps

Note that

doing record with just using 1 will generate the same image as the above equivalent commands;
but doing array with 0:1 will generate the line from the question OP!

It's a shame, that there doesn't seem to be an apparent way to debug the structures behind the "column identifiers" such as 1 in using 1, so as to confirm this one way or another; but I think the above means that:

in record, the 1 describes a 1-column (50x1) dataset - thus it can be coupled with another 1-column index
in array, the 1 actually describes a 2-column (50x2) dataset, where the first column is index - which is why when it is coupled with the another index (pseudocolumn 0), a line is generated (^{because the index pseudocolumn couples with the first column from 1, which is also an index - that is, a list of monotonically rising (or falling) integers})

Then, note the format parameter - help binary format states:

[...] For example, format="%uchar%int%float"
associates an unsigned character with the first using column, an int with the
second column and a float with the third column. [...]

Now, this made me think that if I want to work with two 1-D multiple records, I'd have to use something like this:

plot "bin.dat" binary array=(50):(50) format='%uint8%uint8' using 1 with lines

... which generates:

gnuplot-2rec-50-samps-resamp

It's immediately obvious that the cosine part overlapping the sine is somehow wrong; but even more interesting is that the diagram shows the entire period of these functions in only 50 samples - while we have explicitly generated the data, so both the sine and cosine have a period of 100 samples! Ergo, the data is somehow resampled - and it turns out, the problem is with format.

By specifying "array=(50):(50)", we specify two (multiple) records which are 1-D, and thus have one (and only) column each. However, the "format='%uint8%uint8'" does not refer to a format for each column of the two records - it apparently refers to a second dimension; and given that our two records are 1D, gnuplot simply takes away every other sample from the records.

Therefore, we can just specify a single "format='%uint8'" in our plot command:

plot "bin.dat" binary array=(50):(50) format='%uint8' using 1 with lines

... and get only half a period in 50 samples:

gnuplot-2rec-50-samps

... as expected. But that still doesn't solve the overlap between the records.

Here it is important to remember, that the multiple records seem always to be attributable to one and the same plot. The offset can then be regulated with the origin parameter; help binary keywords origin states:

To position the array somewhere else on the graph, the origin keyword directs
gnuplot to position the lower left point of the array at a point specified by a
tuple. The tuple should be a double for plot and a triple for splot.

So, we can do something like this:

plot "bin.dat" binary array=(50):(50) format='%uint8' origin=(0,0):(50,0) using 1 with lines

... which we can interpret as: get two consecutive 1D records, where their only dimension/column is formatted as uint8 - and offset/move the first record by (0,0) on the plot, and the second by (50,0) (50 units in the +x direction) on the plot. We would expect now that the two records will be concatenated, and indeed:

gnuplot-2rec-50-samps-conc

... we can now observe what we would expect to be the first 100 samples of the data.

This about the records being "part of the same plot" can be more easily seen, if we just displace the second record for a bit (say, 10 units left) from the previously matched position:

plot "bin.dat" binary array=(50):(50) format='%uint8' origin=(0,0):(40,0) using 1 with lines

... while plotting with lines:

gnuplot-2rec-50-samps-disp

It's immediately visible this is not a function anymore, and what effect the origin offsetting has had.

That being said, we can now plot the entire data, by specifying all records in it, and their offsets, explicitly:

plot "bin.dat" binary array=(4):(100):(100):(30) format='%uint8' origin=(0,0):(4,0):(104,0):(204,0) using 1 with lines

... generates, as expected, the four records concatenated - recreating the entire dataset:

gnuplot-4rec-all

Note:

Using array=(4):(100):(100):(-1) (for read to end) generates the same image
Using origin=(0,0):(4,0):(104,0) (leave out last) with four records in array, causes the last 30 bytes to overlap from the beginning
Using array=(4):(100):(100) (leave out last) makes the last 30 bytes disappear from plot (with either three or four records in origin)

Finally, let's look at the skip parameter. The help binary skip states:

[...] For instance, if the file contains a 1024 byte header before the start of the data region you would probably want to use
plot '<file_name>' binary skip=1024 ...

This could be slightly misleading, since both of these commands:

plot "bin.dat" binary array=(100) format='%uint8' using 1 with lines plot "bin.dat" binary skip=4 array=(100) format='%uint8' using 1 with lines

... generate the same plot:

gnuplot-1rec-nooffs

... where no skip is visible; however if we move the skip=4 at end of keyword list (before using), the command becomes:

plot "bin.dat" binary array=(100) format='%uint8' skip=4 using 1 with lines

... and generates:

gnuplot-1rec-offs

... where a skip is, indeed, visible.

Note also from help binary skip:

If there are multiple records in the file, you may specify a leading offset for each. For example, to skip 512 bytes before the 1st record and 256 bytes before the second and third records
plot <file_name> binary record=356:356:356 skip=512:256:256 ...

Let's just illustrate that - the below command:

plot "bin.dat" binary array=(100):(100) format='%uint8' origin=(0,0):(100,0) skip=4 using 1 with lines

... has two 1-D records (we have to add origin to offset, else the records will again overlap), but only one skip - which basically moves the whole sequence left by four units:

gnuplot-2rec-skip

If we now address the both fields in skip, as in the command:

plot "bin.dat" binary array=(100):(100) format='%uint8' origin=(0,0):(100,0) skip=4:20 using 1 with lines

... we can notice on the output:

gnuplot-2rec-2skip

... that the second record has been moved left by 20 units - and to make up for the lost 20 units at end, the remainder of the data from the, otherwise, next record (which is not addressed in the plot command).

Now we can go back to the original question - to "plot the sine and cosine data, with separate color, on a single diagram".

Before that, let's note that with two data "functions" per plot it gets easier to see that the origin parameter actually moves records on the plot; for instance, this command:

plot "bin.dat" binary array=(100) format='%uint8' origin=(4,0) using 1 with lines, \ "" binary array=(100) format='%uint8' origin=(104,0) using 1 with lines

... results with:

gnuplot-1rec-2func

... where the same first 100 samples of data, are rendered on two different places in the plot (and with different colors).

After all the above, it is clear that: trying to "parse" and "split" the data - by using, say, array=(4):(100):(100):(30) - into "records", will not help us much with having two data "functions" per plot (as implied by separate colors); only with a single data "function".

That is, for two data functions case, we can only specify: a single 1D record and its length in array; its (only) column's format; and an offset via skip - per data "function":

plot "bin.dat" binary array=(100) format='%uint8' skip=4 using 1 with lines, \ "" binary array=(100) format='%uint8' skip=104 using 1 with lines

... in order to get the desired rendering:

gnuplot-1rec-2func

As a final note - we can obtain the exact same diagram, by replacing binary array with binary record - except instead of using 1, we should write using 0:1:

plot "bin.dat" binary record=(100) format='%uint8' skip=4 using 0:1 with lines, \ "" binary record=(100) format='%uint8' skip=104 using 0:1 with lines

... even if in this particular case, using 1 will work as well.

Well, hope this helps someone,
Cheers!

Plotting 1D binary array (uint8) with multiple records in gnuplot

Tags:

gnuplot

sdaau

1 Answers

sdaau

Recent Activity

Donate For Us

Plotting 1D binary array (uint8) with multiple records in gnuplot

Tags:

gnuplot

sdaau

1 Answers

sdaau

Related questions

Recent Activity

Donate For Us