Yesterday I introduced the subject of Web Performance Management. [Note: I have since rewritten that material as the Performance Topics page]. To manage application service levels effectively, and satisfy your customers, you must monitor and report on availability and response times. So if you collect 10,000 measurements, what's the best way to report them?
Availability percentages are the easy part; they tell a story that everyone can understand. If 5 of your measurements failed, then your availability for that period was 99.95%. All you have to do is report the overall percentage for management tracking purposes, and perhaps summarize the causes of the 5 errors for technical staff to follow up and see if those kinds of errors can be reduced or eliminated in future.
Assuming that you have set response objectives for that application, statistics like average response times (or even averages with standard deviations or confidence intervals, for the statistically minded) do do not really show how well you are meeting your goals and satisfying your customers. While technicians may have the time to discover important patterns from frequency distributions and scatter plots, managers need a quick way to understand the bottom line.
This is especially true of Internet measurements, whose distributions can be so skewed that their average does not represent the "middle" of the data. In practice, a few really slow measurements can push up the average, so that as many as 85% of all measurements may actually have been faster than the average. This presents a challenge, especially if you have set different response objectives for many pages of your Web applications, and those pages exhibit different response-time distributions. Reporting that "average response time was 3.7 seconds" is not very informative.
How then should you summarize and report page response times? Until recently, there was no accepted way to reduce response-time data to a common scale that would immediately show managers the level of success being achieved through their SLM efforts. Apdex, short for Application Performance Index, is a new open standard that seeks to address this problem. An alliance of companies whose business is measuring performance has defined the Apdex metric, a user satisfaction score that can be easily derived from any set of response time measurements, once a response time goal has been set.
The Apdex method
The Apdex specification defines three zones of responsiveness: Satisfied, Tolerating, and Frustrated. The satisfaction threshold (T) is your response objective, and the frustration threshold (F) is always set to 4T. This simple rule is justified by the empirical findings of usability research, which will be a topic for a future post. The Apdex metric is computed by counting the satisfied samples plus half the tolerating samples (and none of the frustrated samples), and dividing by the total sample count.
The result is a number between 0 and 1, where 0 means no users were satisfied, and 1 means all users were satisfied. For example, if there are 100 samples with a target time of 3 seconds, where 60 are below 3 seconds, 30 are between 3 and 12 seconds, and the remaining 10 are above 12 seconds, the Apdex score is (60+30/2)/100, or 0.75. This result can be reported in one of two standard formats: 0.75[3.0], or 0.75 with a subscript of 3.0. The key point is that any display or report of an Apdex metric always includes the value of the target T that was used.
So if you achieve an Apdex score of 1.0 by setting yourself the easy target of 25 seconds, your reports must show 1.0. But if you use more appropriate Apdex thresholds that truly reflect the level of service you want your customers to experience, then your Apdex score will tell you how successful you are in reaching your own goals.
I believe the Apdex approach is a really good idea, and I will be discussing it further in future posts.
[This post was first published on Blogger on October 19, 2005.]