Draft - comments and corrections are solicited

Level 1 Attribution

1 Introduction

BAA 03-03-FH defines four "levels" of attack attribution, of which the first is identification of "the specific hosts involved in the attack". The next section of this paper is devoted to a more careful definition of the problem as we see it. After that we describe and evaluate work that is relevant to this problem. We begin by presenting the evaluation criteria along with the reasons we think they are important. The related work falls into two general classes. One is "route based filtering", an enforcement mechanism which drops packets that routers can recognize as carrying incorrect source addresses. The other class consists of various methods for identifying links through which a packet was forwarded.
Note: It's also possible to identify forwarding routers, but links turn out to be a little better.
Most of the related work is previous work, but we also describe a few things that have not appeared before to our knowledge.

Finally we turn to some important general questions. The first is:

How can these methods be used to solve the level 1 attribution problem?

Note that neither class of related work directly solves the problem as we define it.

All of the methods in the related work require cooperation from machines in different locations. We expect in most real networks it will not be possible to obtain universal cooperation. This leads to the question:

How does the performance of various solutions depend on the number and placement of cooperating machines?

Finally, since there are different approaches, it is likely that some machines will be willing and able to cooperate with one approach while other machines will be willing and able to cooperate with another. This leads to the question:

How can evidence from different solutions be combined?

We present answers to these questions and point out some remaining difficult problems that must be overcome.

2 Problem Statement

Our model is that an IP packet, P, is generated by a machine, G, forwarded by a sequence of IP routers, and finally, if not dropped along the way, delivered to a recipient machine.

The goal of "level 1 attribution" is, given P, to identify G. This identification might in general be requested by the recipient, any of the forwarding routers, or any other machine to which they communicate information about P.

Note that every IP packet is supposed to contain the IP address of the machine that generated it in the "source address" field. If this requirement were enforced then there would be no level 1 attribution problem.

Some further explanation is in order on a number of points.

2.1 What's an IP router and what does it mean to "forward"?

By IP router we mean a machine that acts substantially in accordance with rfc1812. That reference describes forwarding, in particular how packets received by a router are transformed into packets that it subsequently sends. The term "forward" is used only for the transformation, described in rfc1812, that affects only a few fields of the IP header. It does not apply to such things as replies to packets addressed to the router or to ICMP replies that the router sends to source addresses of packets it cannot forward. These are considered to be "generated" by the router. Similarly NAT devices that alter source addresses are considered to generate the packets they send with altered source addresses. Although it is important to find the packets that "cause" these packets to be generated, that problem is classified as "level 2 attribution" which is to be addressed elsewhere.

2.2 What does it mean to "generate" a packet?

A packet that is sent by a machine but not "forwarded" in the sense above is considered to have been generated by the sending machine.

2.3 Why only IP?

In general, those who wish to attribute a packet are interested in more than IP, e.g., attributing a non-IP packet, or attributing an IP packet to a non-IP device. We limit the discussion to IP because

the vast majority of actual (at least non-local) traffic to be attributed is expected to be IP
almost all of the work to date has concentrated on IP
the methods applicable to IP and to non-IP traffic overlap substantially

2.4 What does it mean to "identify" a machine?

Ideally this should allow us to find a physical box. In practice a requester is generally very happy with an IP address that is unique within the network visible to him. In some cases, the generating machine has no such address. In this case the second choice would be an identity for the first forwarding router that has such an identity. This may be viewed as an approximate answer. Any further information, such as the fact that the packet arrived at that machine from a particular wire, improves the approximation. This is generally the best that can be achieved with the techniques we review here, and even this is only possible in the presence of very high levels of cooperation.

In the general case, lower levels of cooperation will generate less complete attribution data, and this restricts the source to a set of possible IP addresses. The quality of the result is then related to the size of that set, with smaller sets considered higher quality.

2.5 Physical signals

One area that we do not address here is characterization of physical signals. This is very useful for identifying a machine that is one hop away, but for the most part we are concerned here with machines that are more than one hop away.

3 Evaluation Criteria

We classify methods in terms of the answers to the following questions that are meant to point out important advantages and disadvantages of these solutions relative to each other. The questions are phrased so as to make "yes" the preferred answer.

Does the mechanism allow attribution of a single packet?
Some solutions apply only to large sets of packets presumed to come from the same place. These are generally designed to attribute floods. "Yes" means that after you receive a packet (but possibly only well after deciding that some class of packets is to be attributed - see advance notice below) you can find data for its attribution.
Note: For purposes of the intelligence community we believe that single packet attribution is required.
Does the solution work with existing routers?
Some solutions require changing the way routers work. These face an obvious economic obstacle in the cost of replacing existing routers. If the solution is to be used in high speed routers then an even more severe problem arises. Most organizations are not in a position to build high speed routers and it's very difficult to convince those that are to incorporate the desired changes. It is relatively easy to build custom routers starting from open source software running on cheap standard hardware, but this currently limits the bandwidth to about 1Gbit.
An interesting compromise seems practical for some cases. If a high speed router can identify a relatively small subset of the arriving packets that must be processed by the attribution mechanism, it could route these packets to a separate lower speed but more programmable machine, which could do the specialized processing, and then return the (possibly modified) packets to be forwarded on. We will mention this alternative below as applicable.
One other approach to this problem is worth mentioning. Our general complaint with commercial routers is that the end user has limited control over what they do, i.e., they can be "configured", but not "reprogrammed". We propose a plausible design for arbitrarily high speed programmable routers. This design requires a very small amount of special purpose high speed hardware to handle the high speed network interfaces, and an arbitrarily large number of slower general purpose programmable machines, probably running something like FreeBSD or Linux. The special purpose hardware has to divide the incoming high speed traffic into approximately equal packet streams, which are then distributed to the general purpose machines. These act as low speed but programmable routers, forwarding packets to the special purpose high speed hardware connected to the desired output. This hardware then recombines the outgoing traffic from all of the low speed routers into a single high speed outgoing link. We look forward to the appearance of the proposed special purpose hardware.
Note: Anyone in a position to help make this happen is invited (encouraged, begged) to contact the author!
Does it work without advance notice of packets to be attributed?
Some solutions are too expensive to be used for all packets all the time. One example is that the solution might slow down or even disrupt normal traffic while in operation. As another, a monitor that remembers something about what it has seen must limit the set of packets it remembers. One solution is to remember only recent packets. This, of course, limits attribution to recent packets. Another is to remember only those determined in advance to be interesting. In the latter case we have to tell it when we decide that certain kinds of packets are interesting. And we have to do that before these packets can be recorded. This is a fatal flaw for cases where we don't know until a packet appears that we want to find its origin.
Does it work without additional communication after identification of packets to be attributed?
Typically solutions that rely on memory at cooperating machines require communication with those machines in order to ask what they remember and receive an answer. This is a disadvantage in several ways. First, communication has its own cost in network bandwidth and in time before the attribution is completed. Also that communication can be attacked. This is probably sufficient reason for the owner of the cooperating machine to restrict access to his own network.
Is it free of other "special" problems?
This is place for collection of miscellaneous disadvantages relative to other approaches.

While the questions above are important and useful for classification, we devote most of the space below to the questions

How does it work?
How well does it work?

The second question involves such questions as whether the results can be incorrect, how the method could be attacked and how those attacks can be defended.

4 Related work

As mentioned earlier, we can classify the related work into link identification, which we describe first, and packet filtering, which is described second. Within each of these classes we identify a number of subclasses, each of which may be populated by a variety of more specific variants.

4.1 Link testing

Link testing methods are intended to trace ongoing flooding attacks. They identify links through which the traffic of interest flows by testing candidate links, i.e., monitoring them for the traffic to be traced. One simple way to do this is described in Characterizing and Tracing Packet Floods Using Cisco Router.

The person doing the trace presumably has administrative control over a router known to be on the forwarding path. He reconfigures that router to help him determine which incoming link(s) carry the traffic. In the case of a flood, he might simply ask for a count of packets addressed to the victim arriving on each interface. For other traffic he would need to provide a description that the router can use to identify the traffic of interest. Unless the traffic is being generated by the router it must be arriving at one or more inputs. This leads to the next upstream router(s) which can then be traced in turn.

Similar techniques can be used to move upstream more than one link at a time. One might, with automated support, simultaneously reconfigure the all routers on some perimeter around the victim, such as all the border routers of the ISP. One interesting variant of this is described in CenterTrack: An IP overlay network for tracking DoS floods. This reroutes specified packets over tunnels from edge routers to special tracking machines, which then can easily identify the edge routers initially receiving the traffic.
Note: This idea will appear again in the section on deterministic marking.

Another variant is described in BlackHole Route Server and Tracking Traffic on an IP Network. This uses routing mechanisms to first generate ICMP replies from any router receiving traffic for the victim and then to collect some of these at a central point in order to identify the routers that did receive them. Unfortunately, this technique relies on random (or otherwise stupid forms of) source spoofing.

It is also possible to do a hostile form of link testing without cooperation from the owners of the forwarding routers, as described in Tracing Anonymous Packets to Their Approximate Source
Note: This paper seems to contain the original idea for most of the methods we describe here!
This involves attacking (typically flooding) various links. This has the significant disadvantage of attacking other (presumably innocent) traffic. The expectation is that an attack on a link that carries the traffic to be traced will cause a reduction in the rate at which that traffic flows through that link, and therefore a reduction in the rate at which it arrives downstream. This requires cooperation from machines in a position to attack the targeted links. In this paper that cooperation was probably unintentional. It also requires cooperation from machines in a position to detect the loss of traffic, but typically the party that wants to trace the packets is already in a position to detect the loss of traffic, i.e., at the destination address. This variant also requires more knowledge of the routing than other link identifying methods. The attacking variant is the only one that does not require administrative to routers. In fact, this was the motivation for its development. Normal link testing could actually be done without cooperation of routers if cooperating machines could be placed at the links to be tested, and if these were fast enough to process the traffic at those links. However we regard this as a different solution discussed below.

Classification

single packet: no
existing routers: yes
no advance notice: no
Data collection doesn't start until an attribution problem arises.
no additional communication: no*
Normally the operator has to communicate with upstream routers to get results. Methods that use routing tricks avoid this. The attack variant uses lots of additional communication.
no special problems: no
The process is generally manual (controlled by people), hence slow. It may disrupt normal traffic, and generally requires extra work on the part of routers. For these reasons, link testing techniques are generally considered to be practical only on a temporary (emergency) basis.

Attacks: Since link testing relies on an ongoing attack, changing the attack pattern with time is likely to confuse matters greatly. An attacker who knew details of the attribution testing (what's being tested and when) he could intentionally mislead it.

Since this technique requires administrative access to routers (other than the attack variant), cooperation can be expected to be restricted to the administrative domain owning the routers. To the extent that further cooperation is available it will be manual, hence inconvenient and slow.

The attack variant does not require access to routers but faces a whole set of problems of its own. It still needs cooperating machines in a position to affect the links to be tested and sensors (generally at the victim, but more generally along the attack path) to detect resulting changes in traffic. It also requires knowledge of network topology and routing. The ability to identify a link further depends on such things as how much bandwidth the cooperating machines have available and details of how routers are configured. For instance the ability to attack might depend on what kind of traffic is used to attack and what kind is being attacked.

4.2 Itrace

Itrace describes a method in which routers randomly (probability ~5e-5) send extra packets to source or destination addresses of forwarded packets to report that the packet traversed a given link connected to the router. With sufficiently many routers doing this, the recipient of a flood can reconstruct the path(s) of the flooding packets.

Classification

single packet: no
Actually, it does attribute a single packet but with very low probability, so if you really want to know where a particular packet came from this is not a good solution.
existing routers: no
Although this is described as being done by routers, it seems quite practical to offload the task. If a router could be configured to route packets probabilistically then it could route occasional packets to a separate Itrace machine which would then forward both the original packet and the trace packet. Alternatively, if a router can be configured to sample packets and send them to an analysis interface, that could be connected to an Itrace machine that would simply send out trace packets.
no advance notice: yes
no additional communication: yes
The trace packets are, of course, additional communication relative to non-Itrace operation, but these are sent as part of normal operation independent of any attribution attempts.
no special problems: yes

Attacks: Attackers can send itrace packets to make it appear that attacks come from other places, but generally cannot stop the "good" data from telling the victim about the real attacker. The false (attack) trace packets might be distinguished from the real trace packets by authentication techniques, but it's much easier to make up false packets than to check them, so this is another opportunity for the attacker to impose additional cost on the victim.

4.3 Probabilistic packet marking (ppm)

The method described in Practical Network Support for IP Traceback is similar to Itrace in that a router has to do some extra work for a small fraction of the packets it forwards, though in this case the fraction is substantially more than for Itrace (probability order of .05). It replaces (presumably) non-essential data in forwarded packets with data identifying the link over which the packet was forwarded. This allows the recipient to reconstruct the forwarding path(s) of a flood.

Classification

single packet: no
existing routers: no
As in the case of Itrace one could imagine offloading this task if the router could probabilistically route packets to a marking machine. However, the increased probability suggests that the marking machine would have to be much faster than an Itrace machine, and for a fast link several such machines might be necessary.
no advance notice: yes
no additional communication: yes
no special problems: yes

Attacks: Attacks involving spoofed traceback data are described in GOSSIB vs. IP Traceback Rumors and On the Effectiveness of Route-Based Packet Filtering for Distributed DoS Attack Prevention in Power-Law Internets. In general the two major problems in ppm reliability are

The probabilistic nature of the algorithm causes some packets to not be marked by cooperating routers, and these retain whatever marks are given them by the senders. Attackers can then use these marks to intentionally mislead the attribution mechanism.
In an effort to avoid expanding packets, ppm uses (supposedly) underutilized space in current IP packets. But this is not enough to send the desired data, so the marks in a single packet are used to provide only partial data. However this also leads to errors in attribution, particularly in the presence of large numbers of paths.

Advanced and authenticated marking schemes for ip traceback attempts to fix some of the problems.

Editorial opinion

Our view is that even if all of the problems are fixed, probabilistic marking has no advantage over deterministic marking (below), while deterministic marking has significant advantages over probabilistic marking, particularly for attribution of a single packet. High speed commercial routers don't currently support any of the marking methods. None of them appear to be significantly more difficult to support than any other. In the unlikely event that we can influence router manufacturers to support new features we might as well ask for the ones that are most useful.

The supposed benefit of probabilistic marking over deterministic marking is that it does not require additional space in the packet. This is considered to be a problem not because of the loss of payload bandwidth but because the space will increase with each hop and this is thought to be expensive for routers. The first problem can be solved by allocating the space all at once at the first marking router. The marking itself seems trivial, comparable in complexity with decreasing TTL. In fact the work to allocate the extra space in the packet at the first router is also relatively small. A more realistic objection is that the extra space is likely to cause the packet to exceed the MTU on the next link. This should not be a problem since TCP/IP was designed to deal with such problems. In the current internet it actually is something of a problem because of widespread misconfiguration. See rfc2923 and mss.phildev.net for details. Note the same problems occur with other protocols that need to steal a little bandwidth, e.g., tunneling. There are commonly used "work-arounds" (perhaps more accurately described as "I have to break my network in order to work with your broken network") for such problems.

4.4 SPIE

This method, described in Single-Packet IP Traceback, calls on routers to remember, possibly for a very limited time, data that allows them to answer with high confidence the question "did you recently forward the following packet?". The forwarding path of a single packet can be reconstructed by querying such routers soon after the packet is observed. More recent work (private communication) moves the processing from the router to a specialized machine observing traffic on a link. This can be viewed as a variant of remote monitor described below.

Classification

single packet: yes
existing routers: no
As noted above, this task could be performed without the router. In order to support very high speeds it would be necessary to use special purpose hardware.
no advance notice: yes
no additional communication: no
As in other monitoring solutions one has to query the monitor.
no special problems: no
The storage required is proportional to the number of packets one wants to record. This method achieves high compression, on the order of one or two bytes per packet, but this is still a lot of storage per unit time for a fast router.

Attacks: Attackers can attack the query/response communication, either the traffic or the endpoints. For that reason access to traceback data will normally be restricted to the administrative domain owning the routers, and possibly a few other trusted places.

4.5 Remote Monitors

General purpose machines controlled by the party interested in attribution can be distributed throughout the network to listen to traffic passing by. They record whatever they are programmed to and then answer questions about the recorded data.

Classification

single packet: yes
Of course, we can only expect to attribute the packets that the sensors have been instructed to look for. Generally this is a small subset of all packets.
existing routers: yes
no advance notice: no
no additional communication: no
You have to query the sensor.
no special problems: yes

Attacks: The same considerations described under SPIE apply here as well.

4.6 Deterministic packet marking

In this case routers mark all forwarded packets with path data. Packets traversing the same path therefore are marked the same. The simplest and oldest example of a similar capability is the IP "record route" option. IP routers are supposed to add their own addresses to such an option as they forward packets. This actually records a router rather than a link, but if the option had room for two addresses, it would record the first two routers, which would identify a link.

The first problem with the IP option as an attribution method is that it's the choice of the sender to include the option. In the current internet this allows attribution of senders who wish to be attributed, but that's not very useful. This problem could be solved by cooperating routers inserting the option into packets that do not already have it.

The next problem is that routers are supposed to leave the record route option unchanged if the space allocated for it is already filled with data. An attacker could therefore spoof any data he wanted to. The solution to this problem is that cooperating routers should be configured to know which of their neighbors is "trustworthy" and accept trace data only from such neighbors. Data from other neighbors should be overwritten. This gives us the address of the last router on the path that was preceded by an untrusted neighbor.

In the case where the first cooperating router is not connected to the last in a continuous path of cooperating routers this loses some data, namely the links traversed before the packet reached the last untrusted router. Unfortunately, as will be shown later, this is actually the most valuable data! The best solution to this problem that we have found so far is tunnels between non-adjacent cooperating routers. <-- a new result, I believe --> Similar techniques were mentioned under link testing. In effect, a cooperating router must determine which cooperating router will be the next to receive a given packet, and encapsulate that packet in a tunnel to that next router. In the most general case this leads to an overlay network of cooperating routers. The design of such a network reflects trade offs between the efficiency of routing and what attribution data is available at what places.

It should be mentioned that cooperation in a given method is not always a binary property of a machine. In this case an ISP might be willing to route packets addressed to your network via a tunnel to another router cooperating with you, but still not be willing to route other packets that way.

Another problem is that transitive trust as described above allows a single compromised router to generate any attribution data. For this reason it is better to gather at least some additional downstream data for purposes of validation. We show below how that data can be used.

Providing for Responsibility in a Global Information Infrastructure describes a form of packet marking that gathers complete path data in a very compressed form, encoding each link in just the number of bits to specify one of the neighbors of the machine that received the packet on that link.
Note: This is the earliest reference we have seen for this scheme. It was reinvented and implemented as part of the flooding defense described in A Fair Service Approach to Defenses Against Packet Flooding Attacks. We now see some ways to do a little better, but as a practical matter this is probably not necessary.
As will be seen below, the data needed for attribution is the IP address at the sending side of the first identified link. The compressed path does not make extraction of that data convenient. In small networks with static routing it is sufficient to simply store the association between paths and their origins. In networks with single path routing this can be easily generated by sending a ping to each address and storing the path recorded on the reply packet.

Changing IP to Eliminate Source Forgery suggests a modification that is more suitable in the general case, where the sending IP address of the first recorded link is added to the path data. The compressed path data is sufficient for reconstructing the initial address (in fact, the entire path) but this requires either information about network topology or additional communication to provide such information. An easy test to see that the initial IP address does actually correspond to the recorded path is to send a ping packet to the IP address and see whether the reply comes back with the same path. This, of course, still assumes a single path and a stable route. Note that a router can still forge paths that pass through itself. Advanced and Authenticated Marking Schemes for IP Traceback describes some techniques that are actually meant to address problems in nondeterministic packet marking, but seem applicable to this problem as well.

Pi: A new defense mechanism against IP spoofing and DDoS attacks describes a deterministic marking method that is in some ways a compromise between deterministic and nondeterministic marking. It is deterministic, but like the nondeterministic methods, avoids expanding the packet and generally provides less complete attribution data.

Like ppm there may be a residue left over from the initial marks. Unlike ppm, the residue is a particular set of bits that survive a given path. This means that different marks may in fact come from the same source.
The marks from different paths are not necessarily different.

In general, then, the marks that arrive with PI do not identify a single link as the other methods do. They do still identify a set of possible links. This slightly complicates the analysis below.

Some commercial routers can be configured to do a very limited form of deterministic path marking. Cisco IOS allows routers to alter the TOS field, which could be used by a small neighborhood of routers to identify up to 64 places where a packet enters the neighborhood. Possibly other commercial routers can be programmed in similar ways. This could be used in conjunction with routers that do "real" path marking to allow small clusters of legacy routers to cooperate in a larger neighborhood. That is, the old routers would use TOS marking at entry into the neighborhood, and at exit, a more capable router would incorporate that mark into the "real" path data.

As in other methods above, it is also possible to get away without changing routers in the case where a small portion of the traffic coming into a router can be identified as worthy of attribution. In this case the router could simply route that traffic to a separate lower speed machine that does the marking. This machine could also do the tunneling mentioned above.

Classification

single packet: yes
existing routers: no
A minor exception is TOS marking.
no advance notice: yes
no additional communication: yes
additional communication can be useful for validation
no special problems: yes
Some people consider it a significant problem to increase the packet size. We regard this as minor. This is discussed above in the "editorial opinion" under probabilistic marking. The tunnels used for communicating between non-adjacent cooperating machines may be too expensive to allow large amounts of traffic, and this may serve as a reason to restrict cooperation.

Attacks: An attacker who controls a trusted router can forge any path up to that router unless some further authentication scheme is used. Those paths that disagree with the routing algorithm can be recognized by a simple check as described above. A router that trusts data from an attacker effectively allows that attacker to act like a compromised router. Authentication methods could be used, but these add significant cost in the form of processing time and space in the marked packets.

4.7 Tunneling from cooperating machines to attribution machine

The overlay network described above suggests an additional solution. Instead of tunneling to the next cooperating router, suppose we tunnel to the last one. This is particularly easy in certain common cases, such as the one where we are only interested in attribution of packets sent to our own network. We simply ask the cooperating routers to encapsulate traffic addressed to our network in a tunnel directly to a router in our network. Note that the remote cooperating routers do not have to do any marking, which means that they can be currently available commercial routers. The receiving router can tell from the tunnel on which a packet arrives which cooperating machine forwarded it. In fact, the remote cooperating router can identify the link on which it received a packet by using different tunnels to carry packets that arrive on different links.

The tunneled traffic might carry authentication data but still be unencrypted so as to allow other machines to attribute the encapsulated packets. However, at the end of the tunnel, the original packets emerge without the tunnel encapsulation which is now carrying the attribution data. If attribution is to be done after that then something must be done beforehand to save the attribution data. One possibility is marking in the normal sense. Another is to summarize the attribution data and store the result for later use.
Note: In either case the receiving machine will probably have to be more programmable than a commercial router. Even if the end of the tunnel is the host to which the original packet was addressed, some additional processing must be done in the operating system to make use of the attribution data, since it will not be visible at the application level. How the attribution data is stored and accessed after it arrives, however, is outside the scope of this paper.

If no forwarding paths pass through more than two cooperating machines then this method is the same as the overlay network. We should therefore consider the difference in the case of a packet that passes through more than two cooperating machines. In the overlay network, the first machine sends the packet in a tunnel to the second machine, which then adds a mark and sends it through another tunnel. Here the first machine sends the packet in a tunnel directly to the final cooperating machine. An intermediate cooperating machine receiving a packet already encapsulated could try to verify that the encapsulation is not forged, and if satisfied, simply forward the packet unchanged. A more straight forward solution is to reencapsulate the packet. The receiving machine would then have additional work to do, but would find that the packet had traversed both machines. In effect, we use encapsulation as a form of marking. It is, however, an inefficient method, especially if performed at many intermediate points. This method therefore seems best suited to the case of sparse cooperation, which we expect to be a common case.
Note: This use of tunnels is similar to that proposed in DecIdUoiS: Decentralized Source Identification for Network-Based Intrusions" . They seem to have had in mind a somewhat different use from ours. They propose to build tunnels after an attack is recognized, which would only make sense for ongoing attacks. Also they seem to imagine a situation where it is possible to build a tunnel from anywhere.

Classification

single packet: yes
existing routers: yes
This assumes that the router supports tunneling and that packets to be attributed arrive at a low enough rate to be tunneled.
no advance notice: yes
Except that you might want to change the set of packets to be attributed.
no additional communication: yes
no special problems: yes
This may be too expensive to allow large amounts of traffic, and that may serve as a reason to restrict cooperation.

Attacks: none

4.8 Preventing source spoofing - Ingress Filtering

RFC2827 suggests that routers filter packets with "obviously" false source addresses. Obvious in this case means that a given interface is only supposed to carry traffic to/from an a-priori known fixed IP range. This is a special case of "route based" filtering, discussed next.
Note: This idea seems first to have appeared in 1996 in Internet Holes - Eliminating IP Address Forgery.

Classification

single packet: yes
existing routers: yes
no advance notice: yes
no additional communication: yes
no special problems: no
This limits the possible origins of a packet to either the LAN of its source address or any LAN that is not filtered. When a substantial number of LANs are not filtered this is a very large set of possible origins. Another problem is that it may not even be easy to find out where such filtering is deployed.

Attacks: none

4.9 Preventing source spoofing - Route Based Filtering

On the effectiveness of route-based packet filtering for distributed DoS attack prevention in power-law internets describes a generalization of ingress filtering in which routers filter packets with source addresses that are inconsistent with routing data. The generalization involves replacing the word "obvious" above with much more sophisticated reasoning based on global routing data. We note that it's not clear yet how to obtain sufficient routing data for the internet. SAVE: Source address valididty enforcement protocol describes a protocol by which such data could be collected, but we again have the usual problem of how well partial compliance (in this case using the protocol to generate the needed data) solves the problem.

To the extent that route based filtering is practical, it could be combined with other link identifying methods to allow a router to filter on the basis of not only the fact that this packet is arriving at one of its own interfaces, but to simulate a filtering router further upstream.

The first reference (from Purdue) presents many results on how effective filtering would be if it were done by, e.g., a randomly chosen half of all routers in the internet, or a set of routers chosen by a greedy algorithm to cover all links. All of the analysis is done on the basis of AS's. Of course AS's can be large, so determining the AS of origin might not be very satisfactory. It's not clear whether finer grained routing data can be obtained outside your own AS. Park reports that cooperation from the "right" 20% of AS's (that is, we get to choose the 20%) localizes every IP address to 5 AS's. Since we don't generally expect to be able to choose which places will cooperate, this doesn't seem like a very practical result.

Classification

single packet: yes
existing routers: no
In some cases the set of filters will be simple enough to implement with normal filtering methods.
no advance notice: yes
no additional communication: yes
no special problems: no
The main problem is obtaining data. The next problem will be maintaining the data in real time under routing updates. There may also be as yet unknown implementation difficulties in performing the filtering. And, like ingress filtering, it may be difficult to determine what filters are actually in place.

Attacks: Since it is not yet clear what problems will arise in trying to implement this method, it's also not yet clear what an attacker might do to magnify those problems. One possibility is causing routing updates and preventing the filtering routers from learning how these affect what should be filtered. The routers might then either filter legal packets or, in an effort to avoid that, forward many illegal packets.

4.10 summary table

Note: + is good, - is bad, * means different answers for different cases. Techniques that we consider to be closely related are grouped into sections.

method single packet existing routers no advance notice no later communication no other problems

link testing - + - * -

Itrace - -[1] + + +

probabilistic packet marking (PPM) - -[1] + + +

SPIE + -[2] + - -
memory vs time

monitor + + - - +

deterministic packet marking + -[1] + + +
for tunnels see below

tunneling + +
high speed? + + -
expense limits speed, cooperation

ingress filtering + + + + -
poor effectiveness to cooperation ratio

route based filtering + -[3] + + -
obtaining routing data

Table 1: Summary of methods and characteristics

Notes:
[1] This may be possible using (possibly nonstandard) router features to route a subset of packets to another cooperating machine.
[2] Current work attempts to monitor links with a separate machine. Very high speed links would still require new special purpose hardware.
[3] Current routers are able to filter, and in some cases there will be few enough filtering rules to use this mechanism without unacceptable cost. However, we expect that in other cases this will not be the case. In the general case filtering is similar in complexity to routing.

5 Combining the two classes into a solution

We remarked earlier that neither filtering nor link identification methods directly solves the level 1 attribution problem as we have defined it.
Note: That's admittedly a slight exaggeration. Route based filtering would solve the problem by providing perfect enforcement if it were deployed universally, i.e., between every machine and the network. This may even be practical in some networks.
However, the filtering approach does contain the information needed, and in fact that information can be combined with the information provided by the link identification methods to greatly improve the solution. In order to describe this solution we need to describe route based filtering at a high level.

5.1 Route Based Filtering overview

The routing in a network may be viewed as a set of permissible paths. Route based filtering tests whether the link from which the router receives a packet is on any of the permissible paths from the source address of the packet to its destination address. If not, it must have a falsified source address, so the router drops it.
Note: Routers may also use tests that are easier to perform but allow more false source addresses.

5.2 Relation of Route Based Filtering to the solution

Filtering does not actually solve the level 1 attribution problem, in the sense of giving the receiver of a packet the set of possible origins for the packet. However, the information used by a router in order to do that filtering is the same information that is needed for level 1 attribution. The router needs to test whether a source address is in the set of possible origins for packets received from the link where it arrived, while level 1 attribution needs to generate that same set of origin addresses. One could imagine changing a filtering router to report the set of possible origins for each packet it forwards. This would provide a direct level 1 attribution solution.

Above we point out that filtering and attribution use the same data in different ways. This would almost certainly lead to different data representations. Other differences worth mentioning are that filtering needs immediate access but only to current data, whereas attribution need not be done very quickly (certainly not in the time it takes to forward a packet), and one might wish to do it for packets that arrived long ago. This suggests that while filtering would need to track routing updates in real time, attribution might need access to historical routing data. Also, certain approximations might be made in filtering due to time or space constraints, whereas for attribution it might be worth spending more effort in order to get a better answer.

5.3 Extending the solution to link identifying methods

In order to use link identifying methods to solve the level 1 attribution problem we can simulate an imaginary router above placed at the receiving side of the identified link. More generally, given a set of "positive" links, that a given packet is known to have traversed (those identified by the link identifying methods), and a set of "negative" links, that it is known not to have traversed (the set of links that our link identifying methods did not report, but would have reported if the packet had traversed them), we can conclude that the possible origins are those from which at least one path to the destination traverses all of the observed links and none of the unobserved links.

It is worth mentioning a few minor complications in this scheme. One is that different methods are susceptible to different errors. In some cases we might believe that a packet has traversed a link when it has not, or that it has not when it really has. In these cases, of course, the certainty of the conclusions must be adjusted accordingly. In particular, some methods (such as probabilistic marking) provide strong evidence that a link was traversed but very weak evidence that it was not. Others (such as SPIE) provide strong evidence that a link was not traversed and slightly weaker evidence that it was. Also, some methods provide evidence for only some packets, and we have to know which packets those are. For instance, remote sensors may be looking for only certain types of packets. The most interesting case is route based filtering itself, which tells us that certain packets could not have traversed certain links.
Note: Actually it tells us that those that did traverse those links were dropped immediately after they did, so the fact that a packet arrived at some later place shows it did not traverse that link.
However, in order to know which packets could not have traversed which links we need the same data that the router uses to filter.

Despite the complications, we now see that there is a very straight forward way to combine the evidence from all of the methods we have described, both link identifying and filtering methods. There may still be other ways of combining lower level conclusions of different methods, i.e., data used to justify the conclusion that a link was or was not traversed. We have not attempted to identify these lower level conclusions or how they might be combined.

5.4 Remaining problems in Route Based filtering and Level 1 attribution

As described above, the ability to translate what can be observed from cooperating machines into the desired set of IP addresses relies on routing information. While the initial focus of this work was the cooperation of the machines needed for identifying links, we now see that another kind of cooperation is needed in order to obtain the necessary routing data! Routing protocols are designed to tell each router what it needs to know in order to forward packets. They do not attempt to compute what sources could have sent packets that would traverse a given link with a given destination address.

In a private network it may be possible to obtain complete routing data. In the current internet this is apparently not possible at present. There are sources of partial routing data in the internet. What data can be obtained, how it can be combined and how well the result can be used to determine the set of possible origins of packets traversing a given link are all subjects of active research. (*** citations? Kihong Park, Rajesh Talpade, me, who else?)

5.5 Quality of attribution as a function of cooperation

The other general question we hoped to explore was how the set of cooperating machines was related to the quality of the attribution result for various methods. Here we refer to the machines that cooperate in identifying links or filtering packets, leaving aside the new issue of what cooperation might be needed to obtain routing data. It would be nice to be able to characterize the cooperation and the quality of the solution in simple terms, such as "with n percent of the routers cooperating we can expect to limit the set of possible origins to size m". Unfortunately this seems to be asking too much. The size of the sets of possible origins are generally not uniform, and the results depend strongly on such factors as routing and which routers are cooperating. We instead present some general facts and examples that we hope will serve to help the reader understand of how the placement of cooperating machines affects the attribution quality. We mention to begin with that the mechanisms for identifying links and filtering require the cooperation of remote machines, either the routers that actually do the forwarding or other machines in position to see the traffic in transit. In all of these cases the cooperating machine must be connected to the link to be filtered or identified. The rest of our explanations concentrate on links that are known to be positive or negative. It is to be understood that this maps in the obvious way to the locations of cooperating machines.

Actual routing algorithms are much more constrained than the notion of a set of paths would suggest. Each router makes a local decision based on the content of the packet (generally only the destination address). That is, the router normally doesn't even know where the packet has been before so the next hop it chooses cannot depend on that. Given two links, l1 and l2, with the property that some path, p, traverses first l1 and later l2 on the way to destination d, we will say that l1 is "upstream" of l2 (or l2 is "downstream" of l1).
Note: There are actually two different conditions of interest, which turn out to be the same in the common case where there is only one next hop from a given router to a given destination. For the rest of this paper we assume that is the case, but for completeness we mention here that for positive links the interesting property is the one above, meaning that the set of possible sources for a packet traversing the link is the set of machines at the sending side of the upstream links. For negative links, the corresponding property is that ALL paths through l1 on the way to D later go through l2. This means that if the packet did NOT traverse l2 then it could not have traversed l1, or more to the point, it could not have originated at any point upstream of l2 in this (negative) sense.
The fact that routers choose next hops independent of previous hops means that the set of possible origins for the upstream link is a subset of the set of possible origins for the downstream link. In general, positive results from upstream links provide more information than from downstream links, while negative results from downstream links provide more information than negative results from upstream links. Since the positive links must all be on a common path, all of the positive information is contained in the most upstream positive link. On the other hand, all of the negative information is contained in the set of negative links that are farthest downstream (meaning that no other negative link is downstream of them).
Note: Some link identifying methods identify multiple links, or even complete paths. We see here that only the most upstream link is important. Other links may be useful as consistency checks. There are two main reasons that other links are even collected. One is that the methods sometimes can't tell which links are upstream of which. The other is that the methods were invented for goals other than attribution in our sense. Mostly they are meant as components in defense against flooding attacks, and attributing the packets to ip addresses is not necessary for this task.

Figure 1

Figure 1 attempts to summarize the possible situations. The boxes represent routers and the lines connecting them represent links. In order to remove any doubts about routing the sample graph shows no loops. D is the destination of the packet in question. An arc is meant to represent the subset of the IP addresses or machines upstream of the link crossed by the arc. Plus signs represent positive links (known to have been traversed by the packet), while minus signs represent negative links (known not to have been traversed). Cooperating routers either justify a plus sign on one link other than the one leading toward D (the one where the packet arrived) or, by failing to do that, justify a minus sign on the link from the router toward D. In this case router A has not seen the packet so we put a minus sign on link D-A. Of course, this would also justify minus signs on all the other links connected to A, but we can omit these since they add no more information. Router C, on the other hand, reports that the packet arrived on link C-E which justifies a plus sign on that link. This similarly implies a plus sign on link D-C but again that adds no information. It is also possible for a machine other than a router to monitor a set of links, which might justify such marks as the plus sign on E-F and the minus sign on G-H.

As mentioned earlier, all of the plus signs must be on a common path, specifically on the path from the true source to D. If there are any positive links we can ignore all but the one farthest upstream, in this case link E-F. Also if there are any positive links then we can ignore any negative links other than those upstream of the farthest upstream positive link. In this case the minus sign on link D-A is not upstream of E-F so it adds no new information. We also mentioned earlier that we can ignore any minus sign that is upstream of another minus sign. In this case, if there were a minus sign on link F-G then we could ignore the minus sign on link G-H. The set of possible origins is then the set of machines upstream of the farthest upstream positive link minus (that is, set difference) the set of machines upstream of the remaining negative links. In this example there is only one negative link upstream of the most upstream positive link, so the result is the set of machines upstream of E-F minus the set of those upstream of G-H.

If there are no positive links then we must use the entire address space as the positive set and subtract from that the addresses upstream of the negative links. That case can be visualized by erasing the two plus signs. In that case the answer is the entire address space minus the addresses upstream of D-A and G-H. This illustrates a problem with negative links. They typically provide much less information than positive links. In order to provide the same amount of information as one positive link it would take negative links that cover the entire address space outside that positive link. In particular, negative links far upstream provide very little information. Ingress filtering is a case in point. An ingress filter amounts to a minus sign at the outside of the diagram (very far upstream). The common complaint is that ingress filtering does little good unless it's deployed universally. For instance, if half the networks in the internet did ingress filtering, this would allow you to attribute an arbitrary packet to either the network containing its source address or the half of the networks that don't filter. That seems a very weak result considering that half of the networks in the internet are cooperating.
Note: Despite the complaints, ingress filtering is currently the most widely deployed technology relevant to level 1 attribution. Even at low levels of deployment it's worth using where available. The greatest cost in doing so may be simply finding out where it is deployed. If others start to deploy more general filtering technology, there will also be a cost in determining what exactly is being filtered.

We now present a variety of what we hope are instructive examples.

As an example where negative links are very useful, consider a perimeter of sensors around the destination, all reporting negative results. This shows that the packet originated from inside the perimeter.
Even a single negative link can be very useful by eliminating a large part of possible address space. In figure 1 this would be the case if most of the machines upstream of link E-F were also upstream of link G-H. Alternatively, in figure 1, if most of the address space were upstream of link D-A then that single negative link would be very useful even in the absence of any other data.
A cooperating firewall allows you to determine whether a packet originated inside or outside the firewall. The set of possible sources is generally much larger outside, but you might be much more interested in knowing which packets originated inside.
We next consider a filtering firewall which does not identify packets that it forwards. It simply drops inbound packets with inside source addresses and outbound packets with outside addresses. The fact that negative results are most useful close to the destination might suggest that filters, which provide only negative results, would be particularly useful close to the destination. The problem is that filters provide negative information only for those addresses that would be filtered. A packet that arrives on the inside with an outside source address would not have been filtered, so we gain no information about its origin (it could have come from either the inside or outside). This does allow us to attribute to the inside any packets with inside source addresses observed on the inside. For purposes of attribution filters would obviously be more useful if they also reported links for packets they forwarded.
Suppose you want to be able to determine whether an arbitrary packet that arrives in your network originated in Brazil.
Note: We restrict this to packets arriving at some particular place because in order to even see every packet originating in Brazil (without even trying to determine its origin) it would be necessary to have a sensor on every link between machines in Brazil.
The first task is to determine what IP addresses are in Brazil. We do not address the problem of how to do that. In fact that problem may already be unsolvable. The next task is to determine the routing from those addresses to you. So far we don't have a good solution to that problem either. Of course, routing is subject to change. It is clearly useful for both the party interested in attribution and his adversary to control routing. The next step is to find the links on these routes between machines inside Brazil and machines outside. We need to monitor these links. (Just in case the previous problems didn't seem difficult enough!). Note that there may be some links between inside Brazil and outside that we don't need to monitor. These are the links that only carry traffic that could never get to your network. We conclude that a packet originated in Brazil if the farthest upstream link of those above originates in Brazil. Referring again to figure 1, we can imagine that links E-F and G-H are the only ones connecting the inside of Brazil to the outside. G-H carries traffic from Argentina which reaches you through Brazil. E-F carries traffic from Brazil to Florida. In this case a packet originated in Brazil if it was first marked on link E-F (which implies that it was not marked on link G-H).

5.6 Summary of the attribution process

The party that wants to do attribution initially arranges to gain the cooperation of some set of machines in various places around the network. Generally he has some control, but not complete control over where such cooperating machines can be placed. Where he would like to place cooperative machines depends on what sort of attribution problems he wants to solve. The first question is what class of packets he would like to attribute. The answer is typically those with a particular set of destinations, but sometimes those with a particular set of origins, and sometimes only a subset of such packets. The second question is what sort of answer he needs, typically expressed as a set of equivalence classes, e.g., he wants to be able to determine whether the packet originated in his own internal network or elsewhere and within his own network he wants to be able to attribute each packet to a single machine. Finally he needs some idea of the network routing to be expected. This determines whether a set of cooperating locations is adequate to satisfy the requirements , and so drives the placement.

When a packet arrives the cooperating machines report that the packet traversed certain links and did not traverse certain others. This, along with detailed routing information (obtained we do not yet know how) implies that it must have originated from one of the places in a set that is computed.

6 Summary

The first contribution of this paper is the definition of the level 1 attribution problem in terms of identifying sets of possible IP addresses from partial link traversal data. In the past traceback has been viewed as identifying the entire forwarding path, which requires levels of cooperation that can be expected only in theory. We also point out the relation between this problem and route based filtering. We further point out that many different methods that have been proposed all produce similar data, namely links traversed (or not) by the packets to be attributed. We show how partial results for all such methods can be combined to produce the best result and characterize the quality of that result. Finally, we survey and characterize a wide variety of techniques, some of which appear not to have been discussed in the literature, at least in this context.

address for Don's spam filter, which will provide instructions for reaching him

Last modified: Fri Dec 5 09:03:48 PST 2003