I'd like to be able to test some guesses about memory complexity of various command line utilities.
Taking as a simple example
grep pattern file
I'd like to see how memory usage varies with the size of pattern
and the size of file
.
For time complexity, I'd make a guess, then run
time grep pattern file
on various sized inputs to see if my guess seems to be borne out in reality, but I don't know how to do this for memory.
One possibility would be a wrapper script that initiates the job and samples memory usage periodically, but this seems inelegant and unlikely to give the real high watermark.
I've seen time -v
suggested, but don't have that flag available on my machine (running bash on OSX) and don't know where to find a version that supports it.
I've also seen that on Linux this information is available through the proc
filesystem, but again, it's not available to me in my context.
I'm wondering if dtrace
might be an appropriate tool, but again am concerned that a simple sample-based figure might not be the true high watermark?
Does anyone know of a tool or approach that would be appropriate on OSX?
Edit
I removed two mentions of disk usage, which were just asides and perhaps distracted from the main thrust of the question.
Your question is interesting because, without the application source code, you need to make a few assumptions about what constitutes memory use. Even if you were to use procfs
, the results will be misleading: both the resident set size and the total virtual address space will be over-estimates since they will include extraneous data such as the program text.
Particularly for small commands, it would be easier to track individual allocations, although even there you need to be sure to include all the possible sources. In addition to malloc()
etc., a process can extend its heap with brk()
or obtain anonymous memory using mmap()
.
Here's a DTrace script that traces malloc()
; you can extend it to include other allocating functions. Note that it isn't suitable for multi-threaded programs as it uses some non-atomic variables.
bash-3.2# cat hwm.d
/* find the maximum outstanding allocation provided by malloc() */
size_t total, high;
pid$target::malloc:entry
{
self->size = arg0;
}
pid$target::malloc:return
/arg1/
{
total += self->size;
allocation[arg1] = self->size;
high = (total > high) ? total : high;
}
pid$target::free:entry
/allocation[arg0]/
{
total -= allocation[arg0];
allocation[arg0] = 0;
}
END
{
printf("High water mark was %d bytes.\n", high);
}
bash-3.2# dtrace -x evaltime=exec -qs hwm.d -c 'grep maximum hwm.d'
/* find the maximum outstanding allocation provided by malloc() */
High water mark was 62485 bytes.
bash-3.2#
A much more comprehensive discussion of memory allocators is contained in this article by Brendan Gregg. It provides a much better answer than my own to your question. In particular, it includes a link to a script called memleak.d
; modify this to include time stamps for the allocations & deallocations, so that you can sort its output by time. Then, perhaps using the accompanying script as an example, use perl
to track the current outstanding total allocation and high water mark. Such a DTrace/perl combination would be suitable for tracing multi-threaded processes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With