Stanford High Performance Networking Group

Tools for the Analysis of Traceroute Samples

We have been studying traceroute samples from NLANR trying to figure out what are the chances of seeing a route change during the life time of a TCP flow. traceroute is a UNIX command that by increasing the TTL field in some UDP packets can detect the path that the route to a particular destination is traversing.

I downloaded a months worth of data from different Universities and research labs across the US and abroad (U.Alaska, U. Hawaii, Stanford, SLAC, UCLA, UCSD, U. Colorado, WUSTL, MIT, Florida State, Waikato, ...). I plotted the traceroutes, so that I could visualize what was going on, and then I analyzed the samples.

A traceroute sample looks like this:

Sun Oct 22 15:24:43 PDT 2000
 1  171.64.17.225  1.026 ms  1.055 ms  0.843 ms
 2  171.64.3.66  0.489 ms  0.438 ms  0.415 ms
 3  171.64.1.221  0.604 ms  0.445 ms  0.490 ms
 4  171.64.1.213  0.563 ms  0.505 ms  0.550 ms
 5  198.32.249.73  0.817 ms  0.763 ms  0.775 ms
 6  198.32.249.62  4.350 ms  3.980 ms  4.059 ms
 7  198.32.8.2  25.672 ms  26.033 ms  25.965 ms
 8  204.131.62.41  136.249 ms *  246.938 ms
 9  * * *
10  * * *
11  * * *
12  * * *
13  128.138.213.35  253.587 ms * *
Mon Oct 30 23:53:09 PST 2000
 1  171.64.17.225  74.307 ms  0.649 ms  0.668 ms
 2  171.64.3.66  0.555 ms  0.442 ms  0.466 ms
 3  171.64.1.221  0.428 ms  0.603 ms  0.456 ms
 4  171.64.1.213  0.442 ms  0.603 ms  0.453 ms
 5  198.32.249.73  0.936 ms  0.784 ms  1.806 ms
 6  198.32.249.62  4.290 ms  3.946 ms  4.181 ms
 7  198.32.8.2  25.681 ms  25.659 ms  25.685 ms
 8  204.131.62.41  26.979 ms  27.079 ms  27.065 ms
 9  128.138.81.217  27.792 ms  27.804 ms  27.805 ms
10  128.138.213.35  27.023 ms  27.267 ms  27.158 ms
# 
# This NLANR project is based on work sponsored by the National Science Foundation
# under Cooperative Agreement No. ANI-9807479. The Government has certain rights to
# the related material presented on this web site.
# 
# If you use data from this server for research or commercial purposes, you must
# give a credit reference to the National Science Foundation Cooperative Agreement
# No. ANI-9807479, and the National Laboratory for Applied Network Research.
# 
# For more information on the AMP Project please visit http://moat.nlanr.net/
#
The asterisks (*) represent an answer that was expected but the traceroute gave up on waiting for it. It can represent a node that is down, or a node that is overloaded or, simply that there is no router at that given distance (in number of hops).

This is an example of the type of graphs that I was generating:

A summary graph of all paths taken in a month

traceroute summary graph

The numbers in black on the top represent the last byte of the IP address of the router. The number X10X represents a line "10 * * *" in the traceroute, i.e. no answer from 10 hops away. The numbers in blue at the bottom represent the percentage of routes that traverse that particular router. The thickness of the line is proportional to that percentage.

A graph of the different domains that were traversed

traceroute domains graph

The colors of the circles represent the domain that the router belongs to (the domain name as obtained by nslookup). The list on the right has the DNS names of the routers. e.g. for Core3-gateway.Stanford.EDU (171.64.3.66) the domain is stanford.edu.

A graph showing the evolution with time of the samples

traceroute time evolution graph

This shows a timely evolution of the traceroute samples. When consecutive samples take the same route they are merged and the line is thickened. When there is a change in the route, a new route is plotted.

In the example, there are 898 traceroute samples considered. The first 223 (between Oct 1 at 0:03 and Oct 2 at 13:03, every 10 minutes) took the same route. Then between 13:03 and 13:26 there was at least one route change, where instead of going through calren2 and abilene the path goes through bbnplanet and qwest. Note, that there might be more than one change, because we are not doing a continuous measurement, and if there there are several reroutes between samplings the tool cannot detect them.

There is another route change before 13:30, and after that there are 168 samples using the same path. Then, between 17:23 and 17:53 the probe arrives to U. Colorado, and it does not not get an answer from the destination. At 18:03 that answer comes, the destination came back up. Note that the destination is the only node that does not have to answer to an expired TTL, since the probe is a packet addressed to it.

At 18:13 the route returns to the original one, and it stays there for another 423 samples. Before 16:43 there is still one change inside of the stanford domain, and then at 16:53 it returns to the original path, where it stays until the end.

A table summarizing the anomalies encountered in the evolution of the samples


======================================================================

Total number of routes: 	898
Total number of routes with problems: 	12 


From	To	StaMon	StaDay	StaHour        	EndMon	EndDay	EndHour        	RRSrc	RR	RR1Tran	RRDst	RFail	Dst!Res	LoopBak	RLoop	Lost	RRTrace	Overloa	End!Dst	BadTrac	Comment		


stanford	ucboulder	Oct	1	00:03:00	Oct	2	13:03:35																


stanford	ucboulder	Oct	2	13:26:04	Oct	2	13:26:04		1												stanford-calren2-ucaid-UNDEFINED-->stanford-bbnplanet-qwest-UNDEFINED-,		


stanford	ucboulder	Oct	2	13:30:51	Oct	3	17:12:58		1												stanford-bbnplanet-qwest-UNDEFINED-->stanford-calren2-ucaid-UNDEFINED-,		


stanford	ucboulder	Oct	3	17:23:25	Oct	3	17:53:26		1				1								colorado-->UNDEFINED-,		


stanford	ucboulder												1										


stanford	ucboulder												1										


stanford	ucboulder												1										


stanford	ucboulder	Oct	3	18:03:13	Oct	3	18:03:13		1												UNDEFINED-->colorado-,		


stanford	ucboulder	Oct	3	18:13:09	Oct	6	16:33:24		1												UNDEFINED-->,		


stanford	ucboulder	Oct	6	16:43:36	Oct	6	16:43:36	1	1												stanford-->stanford-,		


stanford	ucboulder	Oct	6	16:53:29	Oct	7	05:24:07	1	1												stanford-->stanford-,		


stanford	ucboulder	Oct	7	05:34:17	Oct	7	05:34:17																


From	To	StaMon	StaDay	StaHour        	EndMon	EndDay	EndHour        	RRSrc	RR	RR1Tran	RRDst	RFail	Dst!Res	LoopBak	RLoop	Lost	RRTrace	Overloa	End!Dst	BadTrac	Comment		
----------------------------------------------------------------------

		From	To	               		Nroute	Nanomalies        	RRSrc	RR	RR1Tran	RRDst	RFail	Dst!Res	LoopBak	RLoop	Lost	RRTrace	Overloa	End!Dst	BadTrac			

TOTAL:	 	stanford	ucboulder	         	898	12	         	2	7	0	0	0	4	0	0	0	0	0	5	0			

======================================================================

The meaning of the fields is as follows:
From The AMP host that performs the traceroute measurement
To Destination used for the traceroute
StaMon Month when the first sample using this particular path is taken
StaDay Day when the first sample using this particular path is taken
StaHour Time when the first sample using this particular path is taken
EndMon Month when the last sample using this particular path is taken
EndDay Day when the last sample using this particular path is taken
EndHour Time when the last sample using this particular path is taken
RR Total number of route changes between consecutive samples (usually between the last of the previous group and the first of the current group of routes)
RRSrc Number of route changes at the source domain (e.g. stanford.edu for the AMP host at Stanford) between consecutive samples
RR1Tran Number of route changes at a single transit domain between consecutive samples
RRDst Number of route changes at the destination domain between consecutive samples
RFail There was a route failure indicated by an error by traceroute (e.g. !N, !H, !X, ...)
Dst!Res The packet got to the destination network but the destination host does not respond
LoopBak The route looped back on its own (returned to a node visited previously). This is usually caused by a route change in the middle of the measurement
RLoop There is a route loop (packets bounce back and forth between two nodes)
Lost The connectivity is lost, i.e. after a certain point the traceroute consists of only lines with three starts (* * * (this is usually caused by a route failure or by a route change in the middle of the measurement)
RRTrace There is a route change in the middle of the measurement. This can be observed because there are two different routers responding to the same TTL.
Overloa A router was temporally overloaded and did not respond to a TTL that expired on it.
End!Dst The ending point of the measurement is not the destination (either there was a route failure or the destination's IP address changed)
BadTrac The measurement was garbled (If samples are very slow coming back, the output get garbled. This happened on Oct 22nd in many samples).
Comment Indicated what domains had changes.

Finally with this table shows the analysis of the traceroute samples. It studies where the reroutes occur (in the source network, in a single transit network, in the destination network or in several networks simultaneously). It also identifies loop-backs, routing loops (packets bouncing back and forth), overloaded routers, route failures, etc.


If you are interested in doing a similar study, you can download the scripts that I used:

That's all. Adios.


Page maintained by Pablo Molinero Fernández (molinero@stanford.edu)
Last modified: Fri May 9 11:30:40 PDT 2003