Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limit the number of single processes in Nextflow workflows

I have the following simple workflow:

workflow {

  Channel.fromPath(params.file_list)
        .splitText(){it.trim()}
        .set { file_list }

  data = GetFromHPSS(file_list)
  data_pairs = CoupleDETXToFile(data, file(params.detx_path))
  SingleDUTimeResFit(data_pairs)

}

In which file_list is a list of paths on a tape-drive system. The GetFromHPSS is the process which retrieves files from the tape system and I need to limit the parallel processes to a fairly low number.

Currently, I am using

executor {
  queueSize = 100
}

in the configuration file but there are two problems:

  1. it limits the overall maximum number of parallel jobs, while I could run thousands of SingleDUTimeResFit processes in parallel
  2. it always first waits until it processed everything from GetFromHPSS instead of continuing with the subsequent processes

Here is an example:

N E X T F L O W  ~  version 21.04.3
Launching `workflows/singledu_timeresfit.nf` [wise_galileo] - revision: 8084ac1482
executor >  sge (502)
[13/ca3e8a] process > GetFromHPSS (426)  [ 18%] 402 of 22840
[-        ] process > CoupleDETXToFile   [  0%] 0 of 402
[-        ] process > SingleDUTimeResFit -

Is there a way to limit GetFromHPSS to a specific number of parallel executions and let the remaining processes run with another queue-limit set?

EDIT: This is one of my best tries I guess, but it does not accept the configuration:


process {
  executor {
    queueSize = 100
    submitRateLimit = "10sec"
  }

  withName: GetFromHPSS {
    executor.queueSize = 10
  }
}

With this process top-level configuration, I get:

N E X T F L O W  ~  version 21.04.3
Launching `workflows/singledu_timeresfit.nf` [confident_pasteur] - revision: 8084ac1482
Unknown config attribute `process.withName:GetFromHPSS` -- check config file: /sps/km3net/users/tgal/dev/PhD/workflows/nextflow.config
like image 408
tamasgal Avatar asked Dec 09 '25 18:12

tamasgal


1 Answers

I think what you're looking for here is the maxForks directive, which can be applied to just the 'GetFromHPSS' process without the need to change the executor's queueSize:

process 'GetFromHPSS' {

    maxForks 1

    """
    <your script here>
    """
}

You could even parameterize it, if you think it makes sense:

params.hpss_forks = 5

process 'GetFromHPSS' {

    maxForks params.hpss_forks

    """
    <your script here>
    """
}
like image 76
Steve Avatar answered Dec 11 '25 12:12

Steve



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!