« Five Scalability Principles | Home | Java Performance Optimization »

Latency, Bandwidth, and Response Times

Illustration: Web Page Response Time 101

Latency, Bandwidth, and Station Wagons focused primarily on the limitations of network bandwidth, and the time required to transmit massive data volumes. While that is an interesting topic, and one that produces some surprising results (like the fact that FedEx is still faster than the Internet), it is not particularly relevant to the subject of Web performance, which depends on the time required to transmit many small files.

My post highlighted It's Still The Latency, Stupid, by William (Bill) Dougherty in edgeblog. Bill's title pays homage to a famous 1996 article by Stuart Cheshire about bandwidth and latency in ISP links, It's the Latency Stupid.

Over a decade later, Bill points out, Cheshire's writings are still relevant: One concept that continues to elude many IT managers is the impact of latency on network design ... Latency, not bandwidth, is often the key to network speed, or lack thereof. This is especially true when it comes to the download speeds (or response times) of Web pages and Web-based applications. In this post I explain why, providing some supporting references and examples to support my argument.

The edgeblog debate

Supporting Bill's point about the lack of understanding of this fact, Simon Howard (posting as fragglet) responded with It's the bandwidth, stupid, in which he disagreed strongly with Bill's post. A lively discussion ensued in the comments on edgeblog, beginning here.

Bill then followed his first post with Part 2, which discussed four possible tuning actions to reduce the impacts of network latency:

  1. Tweak the host TCP settings
  2. Change the protocol
  3. Move the service closer to the user
  4. Use a network accelerator

This post was also disputed by fragglet, prompting another response from Bill in the edgeblog comments, here. These discussions are an excellent illustration of the kinds of misunderstandings that exist about the role of network latency as a determinant of performance. Perhaps the most revealing exchange is this one:

Fragglet: You are saying that latency causes network problems and that by improving latency you can improve your network. I assert that this is false. If you have latency problems, they are a symptom of network congestion. If your network is suffering from serious congestion, it probably needs more bandwidth.

Bill: Wow. It is impressive how someone can miss the point so completely so many times. While network congestion will add to latency, latency is in and of itself a problem. In a network with zero congestion, latency will still be a problem. The problem is distance. More bandwidth cannot improve upon the speed of light. Sorry. This is the whole point of my article. Latency does cause issues unrelated to bandwidth or congestion. Those issues can be reduced with planning.

Indeed! I will now explain why fragglet is wrong and Bill is right. As promised (at the end of my station wagons post) I am going base my explanation on the 2001 article by Alberto Savoia, Web Page Response Time 101 [**], which I introduced previously. That's because even though his examples are a bit dated (how many people are still using a 28.8Kbps modem today?), Alberto's article provides a concise and readable explanation of the technical principles involved.

[** Warning: even though this reprint is just 6 pages, it's a 2.5Mb pdf, so wait until you're on a fast connection].

The page response time formula

The Complete Formula

R = 2(D+L+C)+(D+C/2)((T-2)/M)
+Dln((T-2)/M+1)+max(8P(1+OHD)/B,
DP/W)/(1-sqrt(L))

B = Min line speed (bits per second)
C = Cc + Cs
Cc = Client processing time (seconds)
Cs = Server processing time (seconds)
D = Round trip delay (seconds)
L = Packet loss (fraction)
M = multiplexing factor
OHD = Overhead (fraction)
P = Payload (bytes)
R = Response Time (seconds)
T = application turns (count)
W = Window size (bytes)

©NetForecast Inc.

The key to this issue is a simple formula for Web page download time. Credit for the original research goes to Peter Sevcik and John Bartlett of NetForecast Inc.. Their 2001 research report, Understanding Web Performance, was published in Business Communication Review (BCR) in October 2001, and can also be downloaded as a pdf. It explains Web page response time using "The Complete Formula" shown here.

Alberto's contribution was in simplifying this formula to the version shown in his article's Figure 1, reproduced below. He notes that:

This formula makes several generalizations and assumptions, and its accuracy varies through the possible range of values (it tends to overestimate below eight seconds and underestimate over eight seconds).

Actually, even that explanation hides some further assumptions. His reference to eight seconds involves assumptions about typical connection latency and bandwidth, typical Web page sizes, and the typical number elements (separately downloadable files) that make up a Web page.

Illustration: The Response Time Formula by Alberto Savoia

But, as he says, for the purposes of this article, however, it will do just fine, since it introduces the key variables that impact page response time and shows you how they relate to each other, without introducing excessive complexity. Alberto describes the six key variables as follows:

The six parameters of Web response time

  • Page size: Page size is measured in Kbytes, and on the surface, the impact of this variable is pretty obvious: the larger the page, the longer it takes to download. When estimating page size, however, many people fail to consider all the components that contribute to page size—all images, Java and other applets, banners from third sources, etc.—so make sure you don’t overlook anything.
  • Minimum bandwidth: Minimum bandwidth is defined as the bandwidth of the smallest pipe between your content and the end user. Just as the strength of a chain is determined by its weakest link, the effective bandwidth between two end points is determined by the smallest bandwidth between them. Typically the limiting bandwidth is between the users and their ISPs.
  • Round trip time: In the context of Web page response time, round-trip time (RTT) indicates the latency, or time lag, between the sending of a request from the user’s browser to the Web server and the receipt of the first few bytes of data from the Web server to the user’s computer. RTT is important because every request/response pair (even for a trivially small file) has to pay this minimum performance penalty. As we shall see in the next section, the typical Web page requires several request/response cycles.
  • Turns: A typical Web page consists of a base page [or index page] and several additional objects such as graphics or applets. These objects are not transmitted along with the base page; instead, the base page HTML contains instructions for locating and fetching them. Unfortunately for end-user performance, fetching each of these objects requires a fair number of additional communication cycles between the user’s system and the Web site server—each of which is subject to the RTT delay I just mentioned.
  • Server processing time: The last factor in the response time formula is the processing time required by the server and the client to put together [i.e. generate and render] the required page so it can be viewed by the requester. This can vary dramatically for different types of Web pages. On the server side, pages with static content require minimal processing time and will cause negligible additional delay. Dynamically created pages (e.g., personalized home pages like my.yahoo.com) require a bit more server effort and computing time, and will introduce some delay. Finally, pages that involve complex transactions (e.g., credit card verification) may require very significant processing time and might introduce delays of several seconds.
  • Client processing time: On the client side, the processing time may be trivial (for a basic text-only page) to moderate (for a page with complex forms and tables) to extreme. If the page contains a Java applet, for example, the client’s browser will have to load and run the Java interpreter, which can take several seconds.

--Alberto Savoia, Web Page Response Time 101, STQE, July/August 2001
[Minor clarifications added]

There is no question that these six variables do account for Web page response times, and do operate in the directions prescribed in the formula. As one confirmation, consider the authoritative textbook on the quantitative aspects of computing, Computer Architecture by John L. Hennessy and David A. Patterson ("H&P"). In the 3rd Edition (2003), Chapter 8 covers networks, and page 798 gives this simple formula for the total latency of a message:

Total latency =
Sender overhead + Time of flight + Message size/Bandwidth + Receiver overhead

The Web uses TCP, which transmits data as message segments comprising one or more packets, with each segment being acknowledged. So we can view a Web page download as a succession of H&P's message transmissions. Because each TCP segment is followed by an acknowledgment, H&P's time of flight variable corresponds to Alberto's round-trip time. And so summing N instances of H&P's formula produces Alberto's formula, with Turns having a value of N.

This demonstrates that the formula is correct, but if we wanted to use it to predict page download time, the task of selecting the right values to plug into those variables involves some complications.

Turn, turn, turn ...

Alberto does explain how to use the formula. But in practice, I would expect those directions to produce a worst-case estimate for response time, mainly because the value of the Turns variable should probably be lower than Alberto's method suggests. Three factors affect the estimation of turn counts:

  1. Alberto assumes HTTP 1.0, but HTTP 1.1 is now the predominant platform. HTTP 1.1 implemented persistent TCP connections, so that browsers do not repeatedly close and reopen TCP connections with a server, and (for large files) TCP segments will be larger, because of the way TCP slow-start operates. So most browsers will now download a Web page using fewer turns.
  2. Browsers can open up to two parallel connections for each distinct server domain, removing some turns from the synchronous response time path.
  3. Most users of most Web pages already have some page elements in their browser caches, eliminating the turns that would be needed to fetch them. On the other hand, recent research at Yahoo suggests that browser caching benefits may be less than imagined -- see Performance Research, Part 2: Browser Cache Usage - Exposed! by Tenni Theurer.

Fortunately, such complications do not prevent us from using the formula to explain the relative importance of bandwidth and latency.

Implications of the formula

To demonstrate this, consider a newer version of the formula shown in Figure 2 below. This comes from a September 2006 NetForecast report, Field Guide to Application Delivery Systems. The only difference between this and Alberto's version is in its use of curly equals, signifying "is approximately equal to":

Response Time Formula by NetForecast

Figure 2. Response Time Formula by NetForecast

The NetForecast paper goes on to discuss how each of the six factors affects response times, using Figure 3 (below) to summarize its points.

Response Time Causes and Effects, by NetForecast

Figure 3. Response Time Causes and Effects

In addition to the issues highlighted by NetForecast, we can evaluate the relative contributions of Bandwidth and Latency to overall response time by simply comparing the two factors [Payload/Bandwidth] and [Turns x RTT] in typical Web environments.

  • [Payload/Bandwidth]: Alberto's paper focuses on dial-up performance, using the example of effective bandwidth being 4Kbytes/sec. In this situation, a 120K page may take 30 seconds to transfer, before we even consider any latency effects. But as bandwidth increases, this factor becomes progressively smaller. A broadband connection of 1.5Mbps can transfer about 150Kbytes/sec, reducing this factor to 0.8 secs for a 120K Web page, or 2 secs for a 300K page.
  • [Turns x RTT]: According to NetForecast, the average Keynote Business 40 home page requires 60 turns and 300 kilobytes to load. If RTT (or "ping times") are in the range of 100-200ms (typical times for many consumers), 60 turns will add 6-12 secs for network latency. Ping times will certainly be faster for broadband connections than they are for dial-up, because of all the extra analog-digital conversion delays imposed by the dial-up technology. But at a certain point, the speed of light and the latencies of the carriers' network devices (hubs, routers) limit further improvement.

The bottom line ...

In the analysis above, I showed that as connection bandwidth increases, the effect of the first factor in the response time formula approaches zero. No such scaling effect exists for the second factor -- neither latency nor turns can ever be made to approach zero. In fact, as Web sites and applications keep growing in sophistication, turns are increasing. The NetForecast paper states that over the past decade turn counts for the Keynote Business 40 Web sites have grown 12% per year and payload has grown 20% per year.

Even the newest AJAX technology may increase, not reduce, turns, as designers attempt to replace large monolithic file downloads with multiple smaller requests. Unless those smaller requests can also be designed to happen asynchronously, while the user is doing something else, they will only add to the delay due to network latency.

This is why I agree with Bill Dougherty that It's Still The Latency, Stupid! And until someone finds a way to move bits faster than the speed of light, that's not going to change.

Tags: , , , , , , , , , , , , , , , , , , , , ,

PrintView Printer Friendly Version

EmailEmail Article to Friend

References (1)

References allow you to track sources for this article, as well as articles that were written in response to this article.
  • Response
    During his professional career, Mark Kaganov has published several books and technical papers in the areas of research of plastic materials, the economics of manufacturing, the technology of ion-selective electrodes, QMS, EMS and Internet business. He has also authored five international patents. The first book, ?ISO 9001 - A Practical Guide ...

Reader Comments (10)

Great stuff.

There are a number of factors that can impact the speed of data transmission. Overcrowded bandwidth is a factor. Packet/cell overhead is a factor. Errors (and the resulting retransmissions) are a factor. The speed of switching and routing equipment is a factor. The speed of every component in a client-server system is a factor. Even so...

Latency is a factor in and of itself.

July 25, 2007 | Unregistered CommenterBen Simo

I have nothing so complete or eloquent as this to describe the "latency factor", but I do have an experiment that I do with people when they don't get it.

I have them sit at a "representative client computer". I then have them request a pre-configured "empty" html page with one embedded link to a 1kb file (more often than not "spacer.gif") and we capture the response time for that.

Next I have them launch an object from the site of interest that I have previously downloaded to the desktop. The object is often a large .pdf, but is sometimes a spreadsheet or multimedia file. I also make sure that the relevant "speed launch" services are turned off, since we obviously can't control whether or not a client will have these running. We capture the "click to display" time for that as well and add it to the first number.

Next (after shutting down the speed launch services and whatever else to ensure I'll be comparing at least one type of apples to another type of apples) I have the person navigate to and launch the same object through the site of interest and time that.

Finally, I subtract the "spacer.gif" round trip time plus the "launch .pdf" time from the "get the .pdf from the site" time.

The answer, I tell them, is the parts that they have some control over (unless they can make the .pdf smaller). The rest is latency. That they can't control unless they can:

a) increase the speed of light
OR
b) force their clients to upgrade their computers/internet connection speed

Sure, it simplifies even further than Savoia did in his article, but it makes a point to folks who simply don't have the technical training/background/savvy to "get it" any other way.

Scott

--
Scott Barber
President & Chief Technologist, PerfTestPlus, Inc.
Executive Director, Association for Software Testing
www.perftestplus.com
www.associationforsoftwaretesting.org

"If you can see it in your mind...
you will find it in your life."

August 4, 2007 | Unregistered CommenterScott Barber

Scott,

This is an outstanding analysis of the issue, and I am humbled by your review and support. I'm sorry it took me so long to find your page. Keep up the great work.

Thanks,

-Bill

October 29, 2007 | Unregistered CommenterWilliam Dougherty

Oops,

I meant Chris. Apparently there was not enough latency for me to correct my comment, before it reached your server!

Thanks again.

-Bill

October 29, 2007 | Unregistered CommenterWilliam Dougherty

Thanks for the follow-up, although I only just found it! You should really drop people an email when you post something like this.

In the examples that you describe (relating to websites), your reasoning is to be correct. Downloading a website requires downloading several pieces of media (HTML, images, etc) that form the page. In this scenario, latency does have an effect.

However, this scenario is a very specific one. If you are downloading a 50MB file from a website, for example, latency is not likely to have much effect. You have an initial latency in making the request, but after that your download speed is solely dependent on your bandwidth.

This is the main reason that I posted my responses in the first place. In his discussion of latency, Bill specifically said "As the latency increases, the TCP window shrinks, meaning the sender sends less data before waiting for an ACK", which is patently false. He asserts that latency affects download speed (as in the 50MB file example I previously gave), and this is not the case.

Furthermore, while latency is an issue, the main cause of latency is queueing due to saturated network links - a problem caused by lack of bandwidth. When there is no available bandwidth, packets are queued and latency increases. As a classic example, try playing a computer game like CounterStrike while downloading a large file over the same connection. Your ping time shoots through the roof, because the game packets are queued, waiting for the download packets to be transmitted.

That's why, in my opinion, Bill has misunderstood the problem. Latency can be an issue, but the way to deal with latency is to manage your bandwidth usage, not by investing in "network accelerator" snakeoil.

July 16, 2008 | Unregistered CommenterSimon Howard

Simon, of course bandwidth matters, but most networks are not congested. Bandwidth is easy to fix, while with latency there's not a lot you can do in many cases.

July 29, 2008 | Unregistered Commentermiguel

In response to Simon, who seems to still refuse to accept how TCP works and the effects of latency on any transfer, I suggest you try a little experimentation using a WAN emulator such as the wonderful and free DummyNet.

Put this in between two systems and do a file copy and watch as the transfer times climb with increasing latency, then perhaps you'll need to rethink your thoughts on latency.

I've done extensive lab based testing with several WAN accelerators, and snakeoil they most certainly are not!

As an example, copying a 58.7MB file from a Win2K3 server to a Windows XP client through a WAN emulator, with and without optimisation:

Across an emulated 256Kbps WAN with 100ms of delay it took 37 mins and 2 seconds.

Across the same emulated WAN using a WAN optimiser in a "cold" state it took 6 mins and 39 secs.

The same copy done with the optimizer now in a warm state (i.e. it had seen this traffic before) took just 20.3 seconds.

If that's snake oil then I'll have all I can get my hands on!

Graham.

September 2, 2008 | Unregistered CommenterGraham Hemmings

This may also be of interest to people.


The impact of WAN Optimization on TCP Applications

September 2, 2008 | Unregistered CommenterGraham Hemmings

Charles is throttling the network connection to simulate different network latency and bandwidth conditions. Experiment with different values to see the impact on page load performance.

Studies show that if the end user has bandwidth of 5Mbps or more then extra bandwidth doesn't make any difference to page load times as latency is the limiting factor

March 28, 2011 | Unregistered Commenterfulfulment house

I like your post very much .. Thanks for the excellent piece of information.

October 10, 2013 | Unregistered CommenterSujeet

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>