Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way of using wildcards in gnuplot?

I have multiple files named as the following example: blast_sample1_454LargeContigs.fna.fas_vs_NC_016593_filter.txt

The changing parts are "sample#" (sample) and "NC_#" (reference), and they are in bold. For each reference there are 35 samples. I wrote the following commands to generate a plot for the referece NC_016593 using the data of 35 samples:

filename(n) = sprintf("blast_sample%d_454LargeContigs.fna.fas_vs_NC_016593_filter.txt", n)
plot for [i=01:35] filename(i) using 9:3:($10-$9):($3-$3) with vectors nohead

For every reference I wish to do a plot, therefore for this purpose I would like to write a general command using wildcards. Is there a way to do it directly in gnuplot? Is it possible to modify only the "NC_#" part using a wild card (like the * in shell script, something like NC_*)?

Thanks.

like image 809
Fernando Avatar asked Dec 15 '22 18:12

Fernando


1 Answers

This is not directly possible in gnuplot. However, you can use system calls to get a list of files to plot:

filelist=system("ls *.csv")
plot for [filename in filelist] filename using 1:2

So, here is an example that creates one plot per sample number with all references:

do for [i=1:35] { 
    cmd = sprintf("ls blast_sample%d_454LargeContigs.fna.fas_vs_NC_*_filter.txt", i)
    filelist=system(cmd)
    plot for [filename in filelist] filename using ...
}

If you wand one plot per reference containing all samples, it becomes more difficult if the reference numbers are not a simple series. If you want to plot everything in one big plot, you can use

ls blast_sample*_454LargeContigs.fna.fas_vs_NC_*_filter.txt

(This is Linux. For Windows, you'll need dir \B ...)


EDIT: This question and answer is almost three years old, and I didn't notice the additional question in the comment, until the recent comment came up.

It's not clear what you mean by output names. Filenames? Labels for each curve? Plot title?

In general, you can do

set terminal pdfcairo
do for [i=1:35] { 
    cmd = sprintf("ls blast_sample%d_454LargeContigs.fna.fas_vs_NC_*_filter.txt", i)
    filelist=system(cmd)
    set output sprintf("Sample_%d.pdf", i)

    title= sprintf("This is the title for plot %d", i)
    plot for [filename in filelist] filename using ... title sprintf("This data comes from %s", filename)
}
unset output

Any function returning a string can be used to build your strings.

While the sample number is known as number, it's a bit more tricky to extract the reference number (following "NC"), if you wish to use this. Gnuplot has some rudimentary string functions, which might allow this.

If the reference number always has the same length, I'd use substr(filename,strlen(filename)-a,strlen(filename)-b) with correct values for a and b to extract this number.

If not, I'd use substr to get a string starting at the reference number (the position can be calculated), then search for the first occurence of _ using strstrt, and then cut out the string up to this position. It could be easier to pass this task to an external command line program. Linux's cut would do the job easily.

like image 176
sweber Avatar answered Jan 03 '23 02:01

sweber