Diagnosing Internet Congestion with a
Transport Layer Performance Tool

Matt Mathis <mathis@psc.edu>
Jamshid Mahdavi <mahdavi@psc.edu>

Abstract

The Internet is once again suffering from its own success. The massive increase in presented load has left many large users with insufficient bandwidth to support their applications. This problem is exacerbated by the lack of performance measures by which by which users can compare IP providers.

We have focused on one particular metric, bulk transfer capacity, and a tool, "TReno," which can function as a basis for a formal bulk transfer metric. Bulk transfer capacity requires a very clean network, and has other properties that make it an important gauge of network performance. We introduce our tool, TReno, and describe its relation to current TCP implementations. Our research with TReno led us to key observations about TCP behavior and influenced details of the new specifications for the TCP Selective acknowledgment (SACK) option. Currently TReno does not precisely estimate TCP's performance in the Internet. As SACK is deployed, and the research community reaches consensus regarding its proper behavior, TReno and state-of-the-art TCP will converge to identical performance and behavior.

The IP Provider Metrics subcommittee of the IETF is developing standardized formal metrics for providers. There is a substantial amount of work to be completed in the areas of measurement procedures and statistics before TReno can become a standardized metric. Our goal is for the TReno development effort and the IPPM standards efforts to converge.

TReno will maximize its potential as a metric if it also functions well as a diagnostic. If a long multi-provider path between two users is not performing as well as desired, it would be extremely valuable to use TReno to measure the path hop-by-hop or provider-by-provider to localize the problem.

As this research, development and standardization progress, we will update online documentation.

Introduction

The Internet is once again suffering from its own success. Early in 1995, the US Internet transitioned from a federally subsidized research and education network into a competitive commercial market. As a result of this transition to the commercial sector, there has been a massive increase in demand for Internet services, resulting in a marked increase in the already exponential growth of the Internet. We have now entered an era where the historical big users of the Internet, including the colleges and research institutions responsible for its development, are unable to obtain the bandwidth needed to support their applications.

In the current Internet market, there are no measures by which to compare providers except price. The problem is compounded since it is impossible to rigorously express performance requirements in a procurement document or as a deliverable in a contract. In order to address these issues, the IETF formed the IP Provider Metrics (IPPM) subgroup of the Bench Marking Working Group (BMWG). This group is chartered with the development of standard metrics and procedures to evaluate IP services.

The lack of good formal metrics for IP services is hurting the Internet. This paper presents a preliminary overview of some of our work in the area. We are focusing on one particular metric, bulk transfer capacity, and a tool, "TReno," which has potential as the basis for a formal bulk transfer metric.

Our own work is progressing on many fronts. Up-to-date information about our performance research IP metrics, and source code are best obtained below the web page at http://www.psc.edu/~mathis/ippm.

Bulk Transfer Capacity

The Internet is used for a wide variety of applications. Different metrics best project the performance of different applications. For example, interactive applications are particularly sensitive to delay and packet loss. File transfer applications require more stable availability of bandwidth. Although it would be useful to specify a single parameter which quantifies the overall performance of a path through the Internet, this variety of demands makes a single parameter impossible to identify.

Our work has focused on one parameter: bulk transfer capacity, defined as the capacity of a single sustained transfer that is long enough to reach equilibrium (i.e. flow control steady state) with the network. Although bulk transfer capacity might seem to have a fairly narrow constituent, it has several properties that make it useful as a performance parameter in a broader sense. First, good bulk transfer capacity requires a very clean network. Second, most network problems which interfere with interactive performance also seriously impair bulk performance. Third, when the Internet has experienced global congestion (e.g. spring of 1988, fall of 1991) Internet stability dropped considerably. High loads often meant lost routing messages (causing spurious "route flaps"), limited usefulness of backup paths, and general difficulties in monitoring and managing the Internet.

In this context, we see focusing on bulk performance as the quickest and most direct way to address key parts of the Internet's current performance problems. Other metrics are important and must still be developed, but in the short run, a bulk performance metric will have the most impact.

Formal metrics

The IPPM subgroup of BMWG is charged with the development of formal metrics for IP providers. This project will use a rigorous top-down approach.

Early discussions at a BoF at the 32nd IETF in Danvers, MA [IPPM minutes] identified the need for a framework and codification of standard definitions. The meeting also generated an initial work list including useful metrics, an overview of existing tools, and discussion of possible approaches to the general problem of standardizing metrics:

  1. Metrics should be true measurements, yielding repeatable results in well defined units. (e.g. a tape measure).
  2. Metrics must be absolutely "fair." When intrinsic differences in technologies introduce measurement bias, the bias must be understood and well modeled.
  3. Metrics must support diversity in the marketplace by informing people who wish to strike a balance between cost and performance. This requires that the metrics be non-pejorative and not impose artificial classifications or grades on services.
  4. Metrics should be measurable by IP providers, their customers or outside testing agencies. Customers should have a capability to measure one provider independent of all others.
  5. Metrics should be useful as specifications in formal business documents, such as quotations, contracts and RFQs. This requirement is critically needed to help high performance users purchase the capacity that they need, and to verify that it is actually met.
  6. Metrics should also have a diagnostic mode, which can be used to identify ("sectionalize") the weak link in a long multi-hop, multi-provider path.
  7. The metrics themselves must be owned "by the community," more specifically, not encumbered by any particular company or individual.
  8. All parts of the Internet and supporting technologies must be driven in the appropriate direction as Internet providers and manufacturers each independently optomize their own components against the metrics.
The precise requirements for metrics will evolve as the IPPM effort progresses.

It is important to note that IPPM is top-down in the sense that it is starting from the broadest possible vista and defining the structure and framework to address the general problem. It will progressively converge toward a concrete and detailed measurement standard.

Our research into TCP dynamics led us to implement a new tool "TReno," for "Traceroute Reno". At the time of its development we anticipated that it would have potential both as a diagnostic and as a vehicle for testing new TCP algorithms. It has proven to be powerful in both areas.

TReno represents a bottom-up approach to developing a performance metric. It is built on phenomena and technology which are intrinsically part of the basic properties of the Internet. It measures a complex composite of Internet charteristics which we allege is relevant to the perceived performance of the Internet. But it is just a piece of software.

Transforming TReno into a formal metric requires precise definition of terminology, test procedures, statistical models and sampling methodology, as well as other standards-related issues, such as formal review and adoption. IPPM is the appropriate forum to transform TReno into a true metric. Our ultimate goal is for the bottom-up TReno development effort and the top-down IPPM efforts to converge.

The tool: TReno

TReno is designed to measure the single stream bulk transfer capacity over an Internet path. It is a amalgam of two existing algorithms: traceroute [traceroute] and an idealized version of the flow control algorithms present in Reno TCP [Stevens94]. It is a natural succession to the windowed ping diagnostic presented at INet94 [mathis94].

TReno probes the network with either ICMP ECHO (as in the ping program), or low TTL UDP packets, which solicit ICMP errors (as in the traceroute program). The probe packets and the replies only rely on standard portions of the IP and ICMP protocols. The packets carry sequence numbers which are reflected in the replies, so that TReno can always determine which probe packet caused each response.

The probe packets are subject to the same congestion effects as TCP: queueing delay and congestion-related loss.

TReno uses the sequence numbers to emulate TCP, performing an idealized version of the de facto standard TCP congestion control algorithms (Slow start, congestion avoidance, etc) [Jacobson88, Stevens94, Stevens96]. These algorithms are used to compute the proper congestion window (cwnd in the code and literature). Like TCP, TReno uses cwnd to meter packets into the network. The windowed ping paper presents a more detailed description of the metering process.

Comparison to TCP

Although TReno emulates TCP, there are a number of notable differences. First, all of the state is kept in the tester (the TReno program). The target at the far end of the path is stateless -- just returning ICMP messages in response to the probe packets. Thus, unlike a real transport protocol, TReno treats loss on the forward and return paths as equivalent. Second, TReno does not actually retransmit lost data. When packets are lost, TReno does the appropriate window adjustments, but continues to send new sequences numbers. It emulates TCP's behavior by recording the sequence number of the "virtual retransmission," which will fully repair the losses. As a result, TReno emulates an idealized TCP with selective acknowledgment (SACK)[mathis96a] and is not subject to problems associated with distinguishing between very late packets (which were incorrectly assumed to be lost) and the resultant retransmissions.

Our work on TReno is an integral part of our research in TCP performance. It is important that TReno accurately predict TCP's behavior in the production Internet. However, we have discovered some anomalous behaviors in the de facto standard TCP implementations currently in use. Our TReno work provided key insights into how TCP might behave, and led directly to some of the details of the new selective acknowledgment Internet-Draft.

Both TCP and TReno use exactly the same congestion control algorithms; significant portions of the original TReno code were inspired by public domain TCP implementations. As we described earlier, TReno depends on an agent in the network generating ICMP messages to reflect sequence numbers back to the TReno tester. Since the target does not hold any state, it is intrinsic that returning messages reflect only the most recently arrived message.

With non-SACK TCP, the data receiver is constrained to report the end of the correctly received contiguous data. If there have not been any losses (the normal condition), the acknowledgment number reports the most recently arriving data. However, once there has been a lost segment, TCP no longer reports arriving new data and instead sends duplicate acknowledgments of the last contiguous data (which is also the beginning of the partially delivered data). Reno TCP estimates how much data has arrived at the receiver by counting the duplicate ACKs.

TReno, on the other hand, always knows precisely which data just arrived, and keeps track of which segments were missing to estimate the equivalent of TCP's acknowledgment number. Thus the essential difference between Reno TCP and TReno is that during recovery, each has precise knowledge about different edges of the partial data, and each must estimate the opposite edge.

Although TReno and Reno TCP use exactly the same congestion control algorithms, they have vastly different behavior because the inaccuracy of Reno's estimate of the arriving data dominates its behavior during recovery.

We looked at possible ways to endow TCP with more precise information about the data arriving after missing segments. We explored a number of options and algorithms, before settling on a slightly revised version of the original TCP selective acknowledgment option [RFC1072]. The essential change is to mandate that the first block in the SACK option must always include the most recently received segment [mathis96a]. This is sufficient for the data sender to accurately track how much data has left the network.

In a soon to be published companion paper [mathis96c], we investigate refinements to TCP congestion control to make better use of the information provided by the SACK option. A significant portion of this work relies on using TReno as a framework to test new TCP congestion control algorithms. The TReno implementation is being used as the prototype for both a TCP simulator version (integrated into the LBL simulator [LBLsim]) and a reference TCP implementation for unix. As we complete the research, the different implementations will converge to a common algorithm and behavior.

Under some circumstances TReno substantially over estimates the performance of current TCP implementations because they do not use selective acknowledgments. These are precisely the situations where SACK TCP is most needed, and will do the most good for the Internet.

Once SACK TCP is available in the market, a user having difficulty with a file transfer application running too slowly might appropriately use TReno to test the path. If the TReno performance is poor, the user can legitimately complain to the provider, who can repeat the TReno measures within the provider's own infrastructure to diagnose the problem.

On the other hand, if the TReno measurement is good, the IP provider will correctly observe that the user has a substandard TCP implementation. The provider can correctly direct the user to pursue the problem with the user's TCP vendor.

We consider it imperative that the research community come to a consensus about the correct behavior for the next-generation TCP. Furthermore, that TReno accurately reflects that behavior.

Work still to be completed

In this section we describe additional work that needs to be completed before TReno is fully useful as a standardized measurement. We will outline the remaining problems to be solved, and propose possible solutions. Most of this work properly belongs in the IPPM sub-committee of BMWG.

First, there is not yet a standard lexicon for IP provider metrics. BMWG has generated a standard terminology for "bench testing" Internet components. This must to be extended to include additional concepts for production infrastructure under load. (We are aware that the discussion to follow precedes formal terminology).

The second issue that must be addressed is how to synthesize an overall provider metric from multiple point-to-point metrics, which are applied between specific pairs of endpoints. A point-to-point metric might be useful for specifying and/or testing between two well-known sites, such as different offices within one organization. However, if customers want to measure the overall performance of an IP provider, they will need a composite measure of the performance among some large pool of targets. The simplistic approach would be to expressly list targets. This has a serious problem: as the provider's topology evolves the listed targets may become irrelevant to the customer base and misrepresent the useful performance of the network.

A better approach would be to have a standardized method of choosing fair and uniform test targets, such that the pool of targets evolves as the network evolves. We believe that the real goal is to be able to use your chosen provider to reach other sites connected to other providers. Therefore, the most important places to reach are the interconnects, as measured to targets just slightly beyond your provider's network.

Issues of bias and fairness can be resolved as follows: Select interconnects on the basis of proximity and traffic share to a given customer. At each of the interconnects, select targets just beyond the edge of the provider (e.g. within adjacent providers' networks), and measure performance from the customer to the pool of targets. From the pool of targets at each hub, select the best measurement as representing the performance to the hub.

The third issue is the sampling and statistical procedure required to produce a single composite number to represent a variable whose value may shift widely across an entire week. We believe that the 95th percentile of 100 ten-second TReno runs over a one week period is a reasonable statistic. Roughly speaking, this excludes the busiest two hours of the day from the estimate of available network capacity.

All of these issues should be addressed under the auspices of the IPPM sub-committee of the BMWG of the IETF. The solutions proposed here have not had the benefit of the public evaluation and review which is part of the IETF standardization process.

TReno as a diagnostic

TReno will maximize its potential as a metric if it also works well as a diagnostic. If a long multi-provider path between two users is not performing as well desired it would be extremely valuable to use TReno to measure the path hop-by-hop or provider-by-provider to localize the problem.

There must be appropriate test targets throughout the Internet. If the targets are computer systems specifically deployed for testing, there must also be a procedure to identify an appropriate sequence of targets.

It would be better if the routers themselves were useful as targets. They already have the necessary function; generating ICMP messages in response to packets with expired time-to-live, but at an insufficient performance level. However, since generating ICMP messages is not viewed as a critical, high performance function, the vast majority of router implement it far more slowly than they can route packets.

A router with a fast ICMP implementation would have the interesting property that any transit user could detect that it was overloaded, and agitate to have it upgraded. This property might be viewed as a self retirement feature for obsolete, overloaded routers.

There is one final research issue: the relationship between the performance of a long path and the performance of each section of the path is not well understood. In many real world situations, there is a single, easily identified bottleneck. However, it is easy to construct examples where the overall performance limit is the result of the interaction of two different charteristics of different sections of the path. Perhaps the simplest example is a single path with different packet size and packet rate limits in different sections. The overall path performance will be bound by the packet size times the packet rate, which can be much smaller than the performance limit in each section by itself.

Conclusion

TReno is promising as the basis of a bulk transfer metric, but work is still needed on all parts of the problem. We need to complete the standards framework for provider metrics, including terminology and requirements for metrics. TReno is just a tool; before it can become a standard metric, formal measurement and statistical procedures must be adopted. In addition some of the technical details to TReno's behavior are not yet fully resolved.

As this work progresses, we will update the online information presented in [IPPMnotes].

References

[IPPMnotes]
Working notes on the IP Provider Metrics (IPPM) effort. Available from: http://www.psc.edu/~mathis/ippm.
[IPPMminutes]
Minutes of the IP Provider Metrics BoF, April 5th at the 32nd IETF at Danvers MA. http://www.psc.edu/~mathis/ippm/meet01.danvers32/minutes.html.
[jacobson88a]
Van Jacobson, Congestion avoidance and control, Proc ACM SIGCOMM '88, August 1988, pp314-329
[jacobson88b]
Van Jacobson, Traceroute source code and documentation. Available from: ftp://ftp.ee.lbl.gov/traceroute.tar.Z.
[mathis94]
Matthew Mathis, Windowed Ping: an IP Layer Performance Diagnostic, Proc of INET'94/JENC5, June 1994.
[mathis96a]
Matthew Mathis, Jamshid Mahdavi, Sally Floyd, Allyn Romanow, TCP Selective Acknowledgment Options, IETF internet-draft (work in progress), to be publisehd soon as an RFC (superseding RFC1072).
[mathis96b]
Matthew Mathis, Jamshid Mahdavi, Diagnosing Internet Congestion with a Transport Layer Performance Tool. Proc INET'96, June 1996. The final electronic version is available from http://www.psc.edu/~mathis/htmlpapers/inet96.treno.html.
[mathis96c]
Matthew Mathis, Jamshid Mahdavi, Forward Acknowledgment: Refining TCP Congestion Control, To appear in Proc ACM SIGCOMM '96, August 1996.
[RFC1072]
Van Jacobson, Bob Braden, TCP Extensions for Long-Delay Paths, IETF RFC1072, October 1988.
[stevens94]
W. Richard Stevens, TCP/IP Illustrated, volume 1, Addison-Wesley, Reading MA, 1994.
[stevens96]
W. Richard Stevens, TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms, IETF internet-draft (work in progress), to be publisehd soon as an RFC.