get memory usage per process with sar, sysstat

Tags:

Can I get memory usage per process with Linux? we monitor our servers with sysstat/sar. But besides seeing that memory went off the roof at some point, we can't pinpoint which process was getting bigger and bigger. is there a way with sar (or other tools) to get memory usage per process? and look at it, later on?

633

asked Apr 20 '17 23:04

kokito

2 Answers

This is purely preference but I would keep it nice and simple until you know what you're looking for. I would create a cronjob to first pipe out your free memory, disk and cpu usage and then to display the top ten culprits.

#!/bin/sh
free -m | awk 'NR==2{printf "Memory Usage: %s/%sMB (%.2f%%)\n", $3,$2,$3*100/$2 }'
df -h | awk '$NF=="/"{printf "Disk Usage: %d/%dGB (%s)\n", $3,$2,$5}'
top -bn1 | grep load | awk '{printf "CPU Load: %.2f\n", $(NF-2)}' 
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head

After finding your culprit, you can hone in a little more and dig into some specifics.

answered Oct 17 '22 07:10

Jason Birchall

sysstat includes pidstat whose man page says:

The pidstat command is used for monitoring individual tasks currently being managed by the Linux kernel. It writes to standard output activities for every task selected with option -p or for every task managed by the Linux kernel [...]

Linux kernel tasks include user-space processes and threads (and also kernel thread, which are of least interest here).

But unfortunately sysstat doesn't support collecting historical data from pidstat and it doesn't seems the author is interested in providing such support (GitHub issues):

[RFE] include pidstat data in sa data files
pidstat: why no binary output option?

`pidstat`

That being said, tabular output of pidstat can be written to a file and later parsed. Typically groups of processes are of interest, rather than every process on the system. I'll focus on a process with its children processes.

What can be an example? Firefox. pgrep firefox returns its PID, $(pgrep -d, -P $(pgrep firefox)) returns comma-separated list of its children. Given this, pidstat command can look like:

LC_NUMERIC=C.UTF-8 watch pidstat -dru -hl \
    -p '$(pgrep firefox),$(pgrep -d, -P $(pgrep firefox))' \
    10 60 '>>' firefox-$(date +%s).pidstat

Some observation:

LC_NUMERIC is set to make pidstat use dot as the decimal separator.
watch is used to repeat the pidstat every 600 seconds in case the process subtree changes.
-d to report I/O statistics, -r to report page faults and memory utilization, -u to report CPU utilization.
-h to make all report groups to be placed in one line, and -l to display the process command name and all its arguments (well, kind of, because it still trims it at 127 characters).
date is used to avoid accidental overwrite of existing file

It produces something like:

Linux kernel version (host)     31/03/20    _x86_64_    (8 CPU)

#      Time   UID       PID    %usr %system  %guest    %CPU   CPU  minflt/s  majflt/s     VSZ     RSS   %MEM   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
 1585671289  1000      5173    0.50    0.30    0.00    0.80     5      0.70      0.00 3789880  509536   3.21      0.00     29.60      0.00       0  /usr/lib/firefox/firefox 
 1585671289  1000      5344    0.70    0.30    0.00    1.00     1      0.50      0.00 3914852  868596   5.48      0.00      0.00      0.00       0  /usr/lib/firefox/firefox -contentproc -childID 1 ...
 1585671289  1000      5764    0.10    0.10    0.00    0.20     1      7.50      0.00 9374676  363984   2.29      0.00      0.00      0.00       0  /usr/lib/firefox/firefox -contentproc -childID 2 ...
 1585671289  1000      5852    6.60    0.90    0.00    7.50     7    860.70      0.00 4276640 1040568   6.56      0.00      0.00      0.00       0  /usr/lib/firefox/firefox -contentproc -childID 3 ...
 1585671289  1000     24556    0.00    0.00    0.00    0.00     7      0.00      0.00  419252   18520   0.12      0.00      0.00      0.00       0  /usr/lib/firefox/firefox -contentproc -parentBuildID ...

#      Time   UID       PID    %usr %system  %guest    %CPU   CPU  minflt/s  majflt/s     VSZ     RSS   %MEM   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
 1585671299  1000      5173    3.40    1.60    0.00    5.00     6      7.60      0.00 3789880  509768   3.21      0.00     20.00      0.00       0  /usr/lib/firefox/firefox 
 1585671299  1000      5344    5.70    1.30    0.00    7.00     6    410.10      0.00 3914852  869396   5.48      0.00      0.00      0.00       0  /usr/lib/firefox/firefox -contentproc -childID 1 ...
 1585671299  1000      5764    0.00    0.00    0.00    0.00     3      0.00      0.00 9374676  363984   2.29      0.00      0.00      0.00       0  /usr/lib/firefox/firefox -contentproc -childID 2 ...
 1585671299  1000      5852    1.00    0.30    0.00    1.30     1     90.20      0.00 4276640 1040452   6.56      0.00      0.00      0.00       0  /usr/lib/firefox/firefox -contentproc -childID 3 ...
 1585671299  1000     24556    0.00    0.00    0.00    0.00     7      0.00      0.00  419252   18520   0.12      0.00      0.00      0.00       0  /usr/lib/firefox/firefox -contentproc -parentBuildID ...

...

Note that each line with data starts with a space, so parsing is easy:

import pandas as pd

def read_columns(filename):
    with open(filename) as f:
        for l in f:
            if l[0] != '#':
                continue
            else:
                return l.strip('#').split()
        else:
            raise LookupError

def get_lines(filename, colnum):
    with open(filename) as f:
        for l in f:
            if l[0] == ' ':
                yield l.split(maxsplit=colnum - 1)        

filename = '/path/to/firefox.pidstat'
columns = read_columns(filename)
exclude = 'CPU', 'UID', 
df = pd.DataFrame.from_records(
    get_lines(filename, len(columns)), columns=columns, exclude=exclude
)
numcols = df.columns.drop('Command')
df[numcols] = df[numcols].apply(pd.to_numeric, errors='coerce')
df['RSS'] = df.RSS / 1024  # Make MiB
df['Time'] = pd.to_datetime(df['Time'], unit='s', utc=True)
df = df.set_index('Time')
df.info()

The structure of the dataframe is as follows:

Data columns (total 15 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   PID        6155 non-null   int64  
 1   %usr       6155 non-null   float64
 2   %system    6155 non-null   float64
 3   %guest     6155 non-null   float64
 4   %CPU       6155 non-null   float64
 5   minflt/s   6155 non-null   float64
 6   majflt/s   6155 non-null   float64
 7   VSZ        6155 non-null   int64  
 8   RSS        6155 non-null   float64
 9   %MEM       6155 non-null   float64
 10  kB_rd/s    6155 non-null   float64
 11  kB_wr/s    6155 non-null   float64
 12  kB_ccwr/s  6155 non-null   float64
 13  iodelay    6155 non-null   int64  
 14  Command    6155 non-null   object 
dtypes: float64(11), int64(3), object(1)

It can be visualised in many ways that depend on what the focus of the monitoring is, but %CPU and RSS are the most common metrics to look at. So here is an example.

import matplotlib.pyplot as plt

fig, axes = plt.subplots(len(df.PID.unique()), 2, figsize=(12, 8))
x_range = [df.index.min(), df.index.max()]
for i, pid in enumerate(df.PID.unique()):
    subdf = df[df.PID == pid]
    title = ', '.join([f'PID {pid}', str(subdf.index.max() - subdf.index.min())])
    for j, col in enumerate(('%CPU', 'RSS')):
        ax = subdf.plot(
            y=col, title=title if j == 0 else None, ax=axes[i][j], sharex=True
       )
        ax.legend(loc='upper right')
        ax.set_xlim(x_range)

plt.tight_layout()
plt.show()

It produces a figure like:

firefox subprocess tree

117

answered Oct 17 '22 07:10

saaj

Related questions
                            
                                Where can I find the SQL files of XAMPP in Linux?
                            
                                How to check if a files exists in a specific directory in a bash script?
                            
                                Bash reading users y/n answer does not work (read command inside while loop reading find output)
                            
                                GDK Screen vs Monitor vs Display?
                            
                                How to get a file that's one folder above Document Root?
                            
                                Bit field's memory management in C
                            
                                mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid prevent from server restart
                            
                                Unable to write on /dev/* files
                            
                                Multiple commands on a single line in Linux [duplicate]
                            
                                Non-blocking read on pipe
                            
                                how to get certain columns from ps aux
                            
                                Combine several files, with a separator, into one file
                            
                                How can I build for linux 32-bit with go1.6.2
                            
                                LD_PRELOAD doesn't affect dlopen() with RTLD_NOW
                            
                                Linux - /usr/local or /var for application data?
                            
                                Python at Synology, how to get Python3 modules installed and where is Python2.7 installed?
                            
                                How to find a base architecture inside Makefile?
                            
                                Why do processes I fork get systemd as their parent?
                            
                                How to Simply Remove Duplicate Frames from a Video using ffmpeg
                            
                                How to instal/setup XMLStarlet in Linux?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

get memory usage per process with sar, sysstat

Tags:

linux

monitoring

sar