I've seen several StackOverflow posts that discuss what tools to use to monitor web application performance, but none that talk about what metrics to focus on.
What web server metrics should be monitored and which should have alerts setup on?
Here are some I currently have in mind:
- requests timeouts (alerts)
- requests queued (alerts)
- time to first byte (may need to be monitored externally)
- requests / second
Also, how can these be measured on a java web application server.
You're off to a good start. I would monitor:
- Total response time
- Total bytes
- Throughput (reqs/sec)
- Server CPU overhead
- Errors (by error code)
I would also alert on the following:
- Application/page not responding
- Excessive response time (this depends upon your app, you'll have to figure out the normal SLA)
- Excessive throughput (this will alert you to a DOS attack so that you can take action)
- 50x errors (such as 500, 503, etc.)
- Server CPU load factor excessive (again, you'll have to determine what typical is, and configure your tool to alert you when things are abnormal, another indicator of DOS or a runaway process)
- Errors in log files (if your tools supports it, configure it to send alerts when errors/exceptions pop up in log files)