How to get the basename of the wildcard values in the snakemake output rule?

Question

In the following example, the output files will be created in the same location as the input files. Is there a way to get the basename of the wildcard value in the output section, so that I can use the basename of the input file to name the output file but write it to a different location?

infile=['/home/user/folder1/file1','/home/user/folder2/file2/']

rule one:
 input: expand("{myfile}", myfile = infile)

 output: "{myfile}" + ".out"

 shell: "touch {wildcards.myfile}.out"

Pereira Hugo · Accepted Answer

There is a simple way to do this in Snakemake using lambda function of Python.

At first you should create a dictionary of your files with the name of file as key and the file with the path as the value like this :

files = {'filesA' : 'path/to/fileA.ext', 'filesB' : 'path/to/fileB.ext'}

This dictionary can be in the snakefile or in the configuration file. I suggest to put it in the configuration file and call it like this config['dict_name']

So now let's write your rule using lambda function :

rule all : 
 input :
   #If you want to create in a different directory use this,
   #but it has to be like output from rule one. 
   #expand('{directory}{filename}{extension}',
           #directory = 'path/to/newdir',
           #filename = config['dictname'].keys(),
           #extension = '.out')
   #Otherwise
   expand('{filename}{extension}',
          filename = config['dictname'].keys(),
          extension = '.out')
rule one:
 input: lambda wildcards: config['dictname'][wildcards.input]

 output: "{input}" + ".out"

 message: "Executing one using {input}"

 shell: "touch {input}.out"

Here in the code you have two rules, the first one called all will be executed. So when you launch snakemake it will want to get a list of files corresponding to the list created by expand function.

Snakemake will look if a rule produce that list of files, if it isn't the case it will search them in the directory. As you can see you can specify everything you want with expand directories, files names, suffix, prefix, extension, ...

In this example, Snakemake wants to have some files with filenames of keys dictionary and an extension .out. The rule one is a perfect rule to produce them.

The rule one works like this : for each key of the dictionary the rule will be executed. The lambda function in the input section play this role. For the record the wildcard can be called whatever you want input is just an example.

To be more elegant you can put in a variable the content of lambda function like this :

_input_One = lambda wildcards: config['dictname'][wildcards.input]

Then write the rule one input like this :

input: _input_One

For more information please check the documentation on https://snakemake.readthedocs.io/en/stable/

Hugo

How to get the basename of the wildcard values in the snakemake output rule?

Tags:

wildcard

snakemake

Veera

1 Answers

Pereira Hugo

Recent Activity

Donate For Us

How to get the basename of the wildcard values in the snakemake output rule?

Tags:

wildcard

snakemake

Veera

1 Answers

Pereira Hugo

Related questions

Recent Activity

Donate For Us