In previous posts I have focused mainly on setting objectives for response times as a vital aspect of site or application usability, and on the Service Level Management process. Now I'm going to discuss performance measurements, and how to get the maximum value from them.
In my experience, companies usually have lots of measurement tools. Granted, some of them do sit on the shelf unused, but many are in use -- some even collecting data continuously. Despite all this data gathering, the value obtained from the data is often a lot less than it might be. Data is meaningless unless it's interpreted and applied; as a medieval scribe might have said, graecum est; non potest legi.
Today I will describe a framework for addressing this concern. The diagram on the left does not represent a process, but rather a conceptual framework. It comprises seven potential ways of exploiting performance data, each one involving a greater level of sophistication than its predecessor. As a mnemonic, the names of the levels bear the initials A-G. (A stroke of luck, aided by careful selection of terminolgy).
Aggregate: The essential starting point. Raw data has its uses -- you need individual data points to investigate exceptions, and scatter plots do reveal some patterns. But most uses for performance data demand summary statistics. Availability percentages, and the Apdex metric (a new way to report response times) are just two examples of many. All performance analysis tools, except those specifically intended only for detailed tracing, support this level.
Broadcast: Once your tools have summarized the data, it does absolutely no good unless it is sent to someone who looks at it. It's amazing how many so-called performance monitoring systems break down here -- data is collected, summarized, then filed away, never to be looked at. In the 70's it sat in a pile of computer printouts collecting dust behind someone's desk, today it's in a file somewhere on the network. That does no good. Stewart Brand coined the famous phrase information wants to be free. My corollary is information about performance wants to be seen! It needs to come out of the server closet and be put on display in a dashboard, where it can be useful.
Chart: If a picture is worth a thousand words, then a chart is worth a thousand spreadsheet cells. A well-chosen set of charts and graphs can help you see the patterns in your aggregate statistics, turning your data into information. Spotting a pattern in a table of numbers is as difficult as spotting Waldo in a maze of comic-book kids.
Diagnose: Patterns (whether found manually or by a tool) are the key to detecting problems (alerting), isolating in which major component of your application or infrastructure those problems lie (triage), and finally discovering (diagnosing) their causes. Over time, tuning your charts can get you through this process faster. In practice, there are very few truly new problems, but we have a tendency to fix and forget, so they come back to bite us. Hold post-mortems, and refine your process.
Evaluate: In polite society, maybe comparisons are odious -- or odorous, depending upon which source you quote. But in the world of SLM, comparisons, far from being undesirable, are essential. Reasonable objectives will be defined by the prevailing Web environment and your competition, and the essence of performance management is to continually evaluate your site against those objectives. Ultimately, user comparisons will help determine your site's success or failure.
Forecast: Projecting future performance is vital component of SLM, otherwise you will be forever dealing with surprises and crises. And understanding your systems' and applications' performance today is the only starting point for projecting how they will behave under load tomorrow. Or during the holiday season. Or when marketing launches that big product promotion. This is an aspect of SLM that my colleague and team member Donald Foss will surely be writing about here, when he gets a spare moment. (With the holidays on the horizon, he's busy these days helping his customers predict their future performance).
Guide: Many details must come together to create the ideal world of ITSO, in which you consistently meet IT service levels while minimizing infrastructure costs and mitigating risks. Not the least of these will be the catalog of best practices you develop for yourself, in the course of measuring and understanding what makes your Web applications tick. These should be captured, documented, and institutionalized in your development processes, so that the hard-earned performance lessons of the past become guidance for the future. This is how the very best companies get the top -- and stay there. Look for my colleague and team member Ben Rushlo to share some of his experiences in this area soon.
I hope this taxonomy and short overview of the uses of performance data is useful. And if you do not learn anything new about performance, at least maybe some of the incidental references will be interesting. Finally, if you really think my scheme needs more letters, please write and tell me what they are. I may be an old dog, but now that I'm getting the hang of blogging, I'm ready to learn some new tricks.
[A version of this was first published on blogger on October 26, 2005.]