Today's list of tuning tips was created by Patrick Killelea, the author of Web Performance Tuning, first published by O'Reilly in 1998, then revised in 2002. When the second edition came out, Patrick also updated his 1998 top ten list, presumably to reflect changes in the rapidly maturing Internet and Web environment.
But O'Reilly still publishes the 1998 list alongside the 2002 list without any further explanation, even though just four recommendations appear on both lists! I see this as evidence that publishers are a lot more interested in selling a book than they are in the usefulness of its content. So let's blame O'Reilly and give Patrick the benefit of the doubt here, and focus on his latest list only. Abbreviating his recommendations, they are:
- Check for compliance with standards
- Turn off the Web server's reverse DNS lookups
- Try out a free analysis tool (to find bottlenecks)
- Use simple servlets or CGI
- Get more memory
- Index your database tables well
- Make fewer database queries
- Look for packet loss and retransmission
- Monitor your Web site's performance
When someone publishes a top ten list, I expect it to include the ten most important and useful recommendations -- especially when its author has written the most comprehensive book available on the subject. In this case, even allowing for the maturing of the Web since 2002, I have no idea how Patrick could have come up with this list. I have three problems with it -- what's in it, what's not in it, and its order. Today I will tackle mainly the first area; here are some brief thoughts about each of his recommendations:
- Check for standards compliance by using Weblint or other HTML checking tools.
- Turn off reverse DNS lookups in the Web server.
- Try out a free analysis tool.
- Use simple servlets or CGI.
- Get more memory.
- Index your database tables well.
- Make fewer database queries.
- Look for packet loss and retransmission.
- Set up monitoring and automated graphing of your Web site's performance.
Content that conforms to the HTML 4.0 standard will load faster and work in every browser because the browser then knows what to expect. Note that Microsoft-based tools create content that does not even use the standard ASCII character set, but instead uses many proprietary Microsoft characters that will display in Netscape as question marks and can slow down rendering.
Complying with standards is always a good thing of course, but it's rarely a performance issue. And how can a browser compatibility problem be rated the top performance guideline? In the 458-page book it merits just 37 words, headed Watch out for Composition Tools with a Bias. Beware of biased guidelines, I say.
If left on, reverse DNS will log a client's machine name rather than IP address, but at a large performance cost. It is better left off. You can always run log analysis tools which look up the names later.
Outdated, even in 2002. This was good advice in 1997: Prior to Apache 1.3, HostnameLookups defaulted to On. This adds latency to every request because it requires a DNS lookup to finish before the request is completed. In Apache 1.3, this setting defaults to Off. This should still appear on a much longer checklist, because security concerns might prompt someone to turn on HostnameLookups. But it doesn't belong at #3 in the top ten.
I've provided a free analysis tool at my Web site that can tell you whether or not your bottleneck is in DNS, or because of connection time or content size, or is on the server side. Work on improving the slowest part first.
The core idea here -- improving the slowest part first -- is a great recommendation; it should have been at the top of the list. It's in the book too, on page 163. On the other hand, the free tool has now been replaced (check the link) by a graph of local house prices. After some digging, I found that Patrick does still have a page about his book, which also contains tons of links to software tools, so it would take a while to figure which one he meant. But this kind of Web research doesn't have to be a treasure hunt -- how hard would it be rewrite the guideline and get O'Reilly to update their site?
Use simple servlets, CGI, or your Web server's API rather than any distributed object schemes like CORBA or EJB. Distributed object schemes are intended to improve a programmer's code-writing productivity, but they do so at an unacceptable cost in performance for end-users.
This is reasonable advice, although the examples need updating -- CGI is legacy technology now, and newer application services like ASP.NET and low level APIs like ISAPI, NSAPI, Apache extensions, etc. are faster. But the central idea is this: When a site handles a lot of business transactions, back-end communication overheads add up fast, and in the worst examples, become the bottleneck that forces you to spread the load across more servers. So anything you can do to minimize the resources consumed per transaction will cut service times and increase server capacity. And probably save money in the process, too -- money that could be spent on the next item.
Your Web server, middleware, and database all will probably do better with more memory, if they still use their hard disks frequently. Hard disks are literally about a million times slower than memory, so you should buy more memory until the disks are phased out.
Absolutely! You'll never be able to throw away your disks, but a key goal of tuning should be to find ways to use them less. Prioritize your hardware resources from fastest to slowest -- memory, processor, disks, LAN, Internet -- and try to reduce use of the slower ones by moving work to the faster ones.
Spectacular improvements are possible if you are inadvertently doing full-table scans on every hit of a particular URL. Indexes allow you to go directly to the data you need.
As opposed to indexing them badly, I suppose. This tuning guideline certainly does not apply to the Web exclusively, it's important whenever databases are used. But it's probably worth repeating in this context, in case anyone creating Web applications thinks that databases use magic to find things. By the way, you can also get spectacular improvements by replacing incompetent programmers and improving their poor designs. But I'd strongly recommend not hiring them in the first place.
If you can cache content in your middleware or servlets, do it. Making connections to a database and using those database connections is typically a bottleneck for performance.
Right! And if you can send less content to the browser, do that too. In fact, doing less work is always a sure way to improve performance. That's a general rule everyone should know, so general that I would not even include it in this list. I consider it part of a tuning framework -- a systematic way to approach any tuning project, not just speeding up Web applications.
There are many network snooping and monitoring tools to help you do this. Intermittent slowness is often due to packets being lost or corrupted. This is because a time-out period needs to pass before the packet is retransmitted.
This is useful advice, as far as it goes -- noisy connections can ruin your response times. But the guideline should really suggest what to do about the problem, if you have it, and that's a subject for a future post. And I'm not sure if it will make my top ten list either, I'll have to wait and see what else I come up with.
This information is free online in Chapter 4 of the second edition of Web Performance Tuning.
Indeed! Measurements usually beat guesswork and clairvoyance. You've probably heard the popular saying that you can't manage what you don't measure, and I've already spent more than enough time researching it. All the same, it's not really a tuning guideline. I'd call it a performance management principle, so I don't think it actually belongs in this list at all.
Summing up ...
So, to sum up my audit of Patrick's list of ten guidelines, I vote to reject two altogether (#1 and #2), downgrade one (#3) to a priority well outside my top ten, accept four (#4, #5, #6, and #7), restate two (#8 and #10) as general principles that don't belong on this list, and reserve judgment on one (#9). That opens up 5 or 6 slots for the things that Patrick missed -- but what should they be? I will tackle that subject in a follow-up post.
[This post was first published on Blogger on August 18, 2006.]