I included a benchmark
directive to some of the rules in my snakemake workflow, and the resulting files have the following header:
s h:m:s max_rss max_vms max_uss max_pss io_in io_out mean_load
The only documentation I've found mentions a "benchmark txt file (which will contain a tab-separated table of run times and memory usage in MiB)".
I can guess that columns 1 and 2 are two different ways of displaying the time taken to execute the rule (in seconds, and converted to hours, minutes and seconds).
io_in
and io_out
likely related to disk read and write activity, but in what units are they measured?
What are the others? Is this documented somewhere?
I've found the following piece of code in /snakemake/benchmark.py
, that might well be where the benchmark data come from:
def _update_record(self):
"""Perform the actual measurement"""
# Memory measurements
rss, vms, uss, pss = 0, 0, 0, 0
# I/O measurements
io_in, io_out = 0, 0
# CPU seconds
cpu_seconds = 0
# Iterate over process and all children
try:
main = psutil.Process(self.pid)
this_time = time.time()
for proc in chain((main,), main.children(recursive=True)):
meminfo = proc.memory_full_info()
rss += meminfo.rss
vms += meminfo.vms
uss += meminfo.uss
pss += meminfo.pss
ioinfo = proc.io_counters()
io_in += ioinfo.read_bytes
io_out += ioinfo.write_bytes
if self.bench_record.prev_time:
cpu_seconds += proc.cpu_percent() / 100 * (
this_time - self.bench_record.prev_time)
self.bench_record.prev_time = this_time
if not self.bench_record.first_time:
self.bench_record.prev_time = this_time
rss /= 1024 * 1024
vms /= 1024 * 1024
uss /= 1024 * 1024
pss /= 1024 * 1024
io_in /= 1024 * 1024
io_out /= 1024 * 1024
except psutil.Error as e:
return
# Update benchmark record's RSS and VMS
self.bench_record.max_rss = max(self.bench_record.max_rss or 0, rss)
self.bench_record.max_vms = max(self.bench_record.max_vms or 0, vms)
self.bench_record.max_uss = max(self.bench_record.max_uss or 0, uss)
self.bench_record.max_pss = max(self.bench_record.max_pss or 0, pss)
self.bench_record.io_in = io_in
self.bench_record.io_out = io_out
self.bench_record.cpu_seconds += cpu_seconds
So this seems to come from functionalities provided by psutil
.
Benchmarking in snakemake could certainly be better documented, but psutil is documanted here:
get_memory_info()
Return a tuple representing RSS (Resident Set Size) and VMS (Virtual Memory Size) in bytes.
On UNIX RSS and VMS are the same values shown by ps.
On Windows RSS and VMS refer to "Mem Usage" and "VM Size" columns of taskmgr.exe.
psutil.disk_io_counters(perdisk=False)
Return system disk I/O statistics as a namedtuple including the following attributes:
read_count: number of reads
write_count: number of writes
read_bytes: number of bytes read
write_bytes: number of bytes written
read_time: time spent reading from disk (in milliseconds)
write_time: time spent writing to disk (in milliseconds)
The code you found confirms that all the memory usage and IO counts are reported in MB (= bytes * 1024 * 1024).
I will just leave this here for future reference.
Reading through
snakemake >= 6.0.0
benchmark module
psutil
's memory_info(), memory_full_info(), io_counters(), cpu_times()
as previously suggested:
colname | type (unit) | description |
---|---|---|
s | float (seconds) | Running time in seconds |
h:m:s | string (-) | Running time in hour, minutes, seconds format |
max_rss | float (MB) | Maximum "Resident Set Size”, this is the non-swapped physical memory a process has used. |
max_vms | float (MB) | Maximum “Virtual Memory Size”, this is the total amount of virtual memory used by the process |
max_uss | float (MB) | “Unique Set Size”, this is the memory which is unique to a process and which would be freed if the process was terminated right now. |
max_pss | float (MB) | “Proportional Set Size”, is the amount of memory shared with other processes, accounted in a way that the amount is divided evenly between the processes that share it (Linux only) |
io_in | float (MB) | the number of MB read (cumulative). |
io_out | float (MB) | the number of MB written (cumulative). |
mean_load | float (-) | CPU usage over time, divided by the total running time (first row) |
cpu_time | float(-) | CPU time summed for user and system |
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With