Finally we turn to some important general questions. The first is:
How can these methods be used to solve the level 1 attribution problem?Note that neither class of related work directly solves the problem as we define it.
All of the methods in the related work require cooperation from machines in different locations. We expect in most real networks it will not be possible to obtain universal cooperation. This leads to the question:
How does the performance of various solutions depend on the number and placement of cooperating machines?Finally, since there are different approaches, it is likely that some machines will be willing and able to cooperate with one approach while other machines will be willing and able to cooperate with another. This leads to the question:
How can evidence from different solutions be combined?We present answers to these questions and point out some remaining difficult problems that must be overcome.
The goal of "level 1 attribution" is, given P, to identify G. This identification might in general be requested by the recipient, any of the forwarding routers, or any other machine to which they communicate information about P.
Note that every IP packet is supposed to contain the IP address of the machine that generated it in the "source address" field. If this requirement were enforced then there would be no level 1 attribution problem.
Some further explanation is in order on a number of points.
In the general case, lower levels of cooperation will generate less complete attribution data, and this restricts the source to a set of possible IP addresses. The quality of the result is then related to the size of that set, with smaller sets considered higher quality.
Some solutions apply only to large sets of packets presumed to come
from the same place. These are generally designed to attribute
floods. "Yes" means that after you receive a packet (but possibly
only well after deciding that some class of packets is to be
attributed - see advance notice below)
you can find data for its attribution.
Note: For purposes of the intelligence community we
believe that single packet attribution is required.
Some solutions require changing the way routers work. These face an obvious economic obstacle in the cost of replacing existing routers. If the solution is to be used in high speed routers then an even more severe problem arises. Most organizations are not in a position to build high speed routers and it's very difficult to convince those that are to incorporate the desired changes. It is relatively easy to build custom routers starting from open source software running on cheap standard hardware, but this currently limits the bandwidth to about 1Gbit.
An interesting compromise seems practical for some cases. If a high speed router can identify a relatively small subset of the arriving packets that must be processed by the attribution mechanism, it could route these packets to a separate lower speed but more programmable machine, which could do the specialized processing, and then return the (possibly modified) packets to be forwarded on. We will mention this alternative below as applicable.
One other approach to this problem is worth mentioning. Our general
complaint with commercial routers is that the end user has limited
control over what they do, i.e., they can be "configured", but not
"reprogrammed". We propose a plausible design for arbitrarily high
speed programmable routers. This design requires a very small amount
of special purpose high speed hardware to handle the high speed
network interfaces, and an arbitrarily large number of slower
general purpose programmable machines, probably running something like
FreeBSD or Linux. The special purpose hardware has to divide the
incoming high speed traffic into approximately equal packet streams,
which are then distributed to the general purpose machines. These act
as low speed but programmable routers, forwarding packets to the
special purpose high speed hardware connected to the desired output.
This hardware then recombines the outgoing traffic from all of the low
speed routers into a single high speed outgoing link. We look forward
to the appearance of the proposed special purpose hardware.
Note: Anyone in a position to help make this happen is
invited (encouraged, begged) to contact the author!
Some solutions are too expensive to be used for all packets all the time. One example is that the solution might slow down or even disrupt normal traffic while in operation. As another, a monitor that remembers something about what it has seen must limit the set of packets it remembers. One solution is to remember only recent packets. This, of course, limits attribution to recent packets. Another is to remember only those determined in advance to be interesting. In the latter case we have to tell it when we decide that certain kinds of packets are interesting. And we have to do that before these packets can be recorded. This is a fatal flaw for cases where we don't know until a packet appears that we want to find its origin.
Typically solutions that rely on memory at cooperating machines require communication with those machines in order to ask what they remember and receive an answer. This is a disadvantage in several ways. First, communication has its own cost in network bandwidth and in time before the attribution is completed. Also that communication can be attacked. This is probably sufficient reason for the owner of the cooperating machine to restrict access to his own network.
This is place for collection of miscellaneous disadvantages relative to other approaches.
The person doing the trace presumably has administrative control over a router known to be on the forwarding path. He reconfigures that router to help him determine which incoming link(s) carry the traffic. In the case of a flood, he might simply ask for a count of packets addressed to the victim arriving on each interface. For other traffic he would need to provide a description that the router can use to identify the traffic of interest. Unless the traffic is being generated by the router it must be arriving at one or more inputs. This leads to the next upstream router(s) which can then be traced in turn.
Similar techniques can be used to move upstream more than one link at
a time. One might, with automated support, simultaneously reconfigure
the all routers on some perimeter around the victim, such as all the
border routers of the ISP. One interesting variant of this is
described in
CenterTrack: An IP overlay network for tracking DoS floods.
This reroutes specified packets over tunnels from edge routers to
special tracking machines, which then can easily identify the edge
routers initially receiving the traffic.
Note: This idea will appear again in the section on
deterministic marking.
Another variant is described in BlackHole Route Server and Tracking Traffic on an IP Network. This uses routing mechanisms to first generate ICMP replies from any router receiving traffic for the victim and then to collect some of these at a central point in order to identify the routers that did receive them. Unfortunately, this technique relies on random (or otherwise stupid forms of) source spoofing.
It is also possible to do a hostile form of link testing without
cooperation from the owners of the forwarding routers, as described in
Tracing Anonymous Packets to Their Approximate Source
Note: This paper seems to contain the original idea for most
of the methods we describe here!
This involves attacking (typically flooding) various links. This has
the significant disadvantage of attacking other (presumably innocent)
traffic. The expectation is that an attack on a link that carries the
traffic to be traced will cause a reduction in the rate at which that
traffic flows through that link, and therefore a reduction in the rate
at which it arrives downstream. This requires cooperation from
machines in a position to attack the targeted links. In this paper
that cooperation was probably unintentional. It also requires
cooperation from machines in a position to detect the loss of traffic,
but typically the party that wants to trace the packets is already in
a position to detect the loss of traffic, i.e., at the destination
address. This variant also requires more knowledge of the routing
than other link identifying methods.
The attacking variant is the only one that does not require
administrative to routers. In fact, this was the motivation for its
development. Normal link testing could actually be done without
cooperation of routers if cooperating machines could be placed at the
links to be tested, and if these were fast enough to process the
traffic at those links. However we regard this as a different
solution discussed below.
Classification
Since this technique requires administrative access to routers (other than the attack variant), cooperation can be expected to be restricted to the administrative domain owning the routers. To the extent that further cooperation is available it will be manual, hence inconvenient and slow.
The attack variant does not require access to routers but faces a whole set of problems of its own. It still needs cooperating machines in a position to affect the links to be tested and sensors (generally at the victim, but more generally along the attack path) to detect resulting changes in traffic. It also requires knowledge of network topology and routing. The ability to identify a link further depends on such things as how much bandwidth the cooperating machines have available and details of how routers are configured. For instance the ability to attack might depend on what kind of traffic is used to attack and what kind is being attacked.
Classification
Classification
The supposed benefit of probabilistic marking over deterministic marking is that it does not require additional space in the packet. This is considered to be a problem not because of the loss of payload bandwidth but because the space will increase with each hop and this is thought to be expensive for routers. The first problem can be solved by allocating the space all at once at the first marking router. The marking itself seems trivial, comparable in complexity with decreasing TTL. In fact the work to allocate the extra space in the packet at the first router is also relatively small. A more realistic objection is that the extra space is likely to cause the packet to exceed the MTU on the next link. This should not be a problem since TCP/IP was designed to deal with such problems. In the current internet it actually is something of a problem because of widespread misconfiguration. See rfc2923 and mss.phildev.net for details. Note the same problems occur with other protocols that need to steal a little bandwidth, e.g., tunneling. There are commonly used "work-arounds" (perhaps more accurately described as "I have to break my network in order to work with your broken network") for such problems.
Classification
Classification
The first problem with the IP option as an attribution method is that it's the choice of the sender to include the option. In the current internet this allows attribution of senders who wish to be attributed, but that's not very useful. This problem could be solved by cooperating routers inserting the option into packets that do not already have it.
The next problem is that routers are supposed to leave the record route option unchanged if the space allocated for it is already filled with data. An attacker could therefore spoof any data he wanted to. The solution to this problem is that cooperating routers should be configured to know which of their neighbors is "trustworthy" and accept trace data only from such neighbors. Data from other neighbors should be overwritten. This gives us the address of the last router on the path that was preceded by an untrusted neighbor.
In the case where the first cooperating router is not connected to the last in a continuous path of cooperating routers this loses some data, namely the links traversed before the packet reached the last untrusted router. Unfortunately, as will be shown later, this is actually the most valuable data! The best solution to this problem that we have found so far is tunnels between non-adjacent cooperating routers. <-- a new result, I believe --> Similar techniques were mentioned under link testing. In effect, a cooperating router must determine which cooperating router will be the next to receive a given packet, and encapsulate that packet in a tunnel to that next router. In the most general case this leads to an overlay network of cooperating routers. The design of such a network reflects trade offs between the efficiency of routing and what attribution data is available at what places.
It should be mentioned that cooperation in a given method is not always a binary property of a machine. In this case an ISP might be willing to route packets addressed to your network via a tunnel to another router cooperating with you, but still not be willing to route other packets that way.
Another problem is that transitive trust as described above allows a single compromised router to generate any attribution data. For this reason it is better to gather at least some additional downstream data for purposes of validation. We show below how that data can be used.
Providing for Responsibility in a Global Information
Infrastructure describes a form of packet marking that gathers
complete path data in a very compressed form, encoding each link in
just the number of bits to specify one of the neighbors of the machine
that received the packet on that link.
Note: This is the earliest reference we have seen for this
scheme. It was reinvented and implemented as part of the flooding
defense described in A Fair
Service Approach to Defenses Against Packet Flooding Attacks. We
now see some ways to do a little better, but as a practical matter
this is probably not necessary.
As will be seen below, the data needed for attribution is the IP address
at the sending side of the first identified link. The compressed path
does not make extraction of that data convenient. In small networks
with static routing it is sufficient to simply store the association
between paths and their origins. In networks with single path routing
this can be easily generated by sending a ping to each address and
storing the path recorded on the reply packet.
Changing IP to Eliminate Source Forgery suggests a modification that is more suitable in the general case, where the sending IP address of the first recorded link is added to the path data. The compressed path data is sufficient for reconstructing the initial address (in fact, the entire path) but this requires either information about network topology or additional communication to provide such information. An easy test to see that the initial IP address does actually correspond to the recorded path is to send a ping packet to the IP address and see whether the reply comes back with the same path. This, of course, still assumes a single path and a stable route. Note that a router can still forge paths that pass through itself. Advanced and Authenticated Marking Schemes for IP Traceback describes some techniques that are actually meant to address problems in nondeterministic packet marking, but seem applicable to this problem as well.
Pi: A new defense mechanism against IP spoofing and DDoS attacks describes a deterministic marking method that is in some ways a compromise between deterministic and nondeterministic marking. It is deterministic, but like the nondeterministic methods, avoids expanding the packet and generally provides less complete attribution data.
Some commercial routers can be configured to do a very limited form of deterministic path marking. Cisco IOS allows routers to alter the TOS field, which could be used by a small neighborhood of routers to identify up to 64 places where a packet enters the neighborhood. Possibly other commercial routers can be programmed in similar ways. This could be used in conjunction with routers that do "real" path marking to allow small clusters of legacy routers to cooperate in a larger neighborhood. That is, the old routers would use TOS marking at entry into the neighborhood, and at exit, a more capable router would incorporate that mark into the "real" path data.
As in other methods above, it is also possible to get away without changing routers in the case where a small portion of the traffic coming into a router can be identified as worthy of attribution. In this case the router could simply route that traffic to a separate lower speed machine that does the marking. This machine could also do the tunneling mentioned above.
Classification
The tunneled traffic might carry authentication data but still be
unencrypted so as to allow other machines to attribute the
encapsulated packets. However, at the end of the tunnel, the original
packets emerge without the tunnel encapsulation which is now carrying
the attribution data. If attribution is to be done after that then
something must be done beforehand to save the attribution data. One
possibility is marking in the normal sense. Another is to summarize
the attribution data and store the result for later use.
Note: In either case the receiving machine will probably
have to be more programmable than a commercial router. Even if the
end of the tunnel is the host to which the original packet was
addressed, some additional processing must be done in the operating
system to make use of the attribution data, since it will not be
visible at the application level. How the attribution data is
stored and accessed after it arrives, however, is outside the scope
of this paper.
If no forwarding paths pass through more than two cooperating
machines then this method is the same as the overlay network. We should
therefore consider the difference in the case of a packet that passes
through more than two cooperating machines. In the overlay network, the
first machine sends the packet in a tunnel to the second machine, which
then adds a mark and sends it through another tunnel. Here the first
machine sends the packet in a tunnel directly to the final cooperating
machine. An intermediate cooperating machine receiving a packet already
encapsulated could try to verify that the encapsulation is not forged,
and if satisfied, simply forward the packet unchanged. A more straight
forward solution is to reencapsulate the packet. The receiving machine
would then have additional work to do, but would find that the packet
had traversed both machines. In effect, we use encapsulation as a form
of marking. It is, however, an inefficient method, especially if
performed at many intermediate points. This method therefore seems
best suited to the case of sparse cooperation, which we expect to be a
common case.
Note: This use of tunnels is similar to that proposed in
DecIdUoiS: Decentralized Source Identification for Network-Based Intrusions"
. They seem to have had in mind a somewhat different use from ours.
They propose to build tunnels after an attack is recognized, which would
only make sense for ongoing attacks. Also they seem to imagine a situation
where it is possible to build a tunnel from anywhere.
Classification
Attacks: none
Classification
Attacks: none
To the extent that route based filtering is practical, it could be combined with other link identifying methods to allow a router to filter on the basis of not only the fact that this packet is arriving at one of its own interfaces, but to simulate a filtering router further upstream.
The first reference (from Purdue) presents many results on how effective filtering would be if it were done by, e.g., a randomly chosen half of all routers in the internet, or a set of routers chosen by a greedy algorithm to cover all links. All of the analysis is done on the basis of AS's. Of course AS's can be large, so determining the AS of origin might not be very satisfactory. It's not clear whether finer grained routing data can be obtained outside your own AS. Park reports that cooperation from the "right" 20% of AS's (that is, we get to choose the 20%) localizes every IP address to 5 AS's. Since we don't generally expect to be able to choose which places will cooperate, this doesn't seem like a very practical result.
Classification
method | single packet | existing routers | no advance notice | no later communication | no other problems |
---|---|---|---|---|---|
link testing | |||||
Itrace | |||||
probabilistic packet marking (PPM) | |||||
SPIE | memory vs time |
||||
monitor | |||||
deterministic packet marking | for tunnels see below |
||||
tunneling | high speed? |
expense limits speed, cooperation |
|||
ingress filtering | poor effectiveness to cooperation ratio |
||||
route based filtering | obtaining routing data |
Notes:
[1] This may be possible using (possibly nonstandard) router features to
route a subset of packets to another cooperating machine.
[2] Current work attempts to monitor links with
a separate machine. Very high speed links would still require new special
purpose hardware.
[3] Current routers are able to filter, and in some cases there will be
few enough filtering rules to use this mechanism without unacceptable
cost. However, we expect that in other cases this will not be the case.
In the general case filtering is similar in complexity to routing.
Above we point out that filtering and attribution use the same data in different ways. This would almost certainly lead to different data representations. Other differences worth mentioning are that filtering needs immediate access but only to current data, whereas attribution need not be done very quickly (certainly not in the time it takes to forward a packet), and one might wish to do it for packets that arrived long ago. This suggests that while filtering would need to track routing updates in real time, attribution might need access to historical routing data. Also, certain approximations might be made in filtering due to time or space constraints, whereas for attribution it might be worth spending more effort in order to get a better answer.
It is worth mentioning a few minor complications in this scheme.
One is that different methods are susceptible to different errors.
In some cases we might believe that a packet has traversed a link
when it has not, or that it has not when it really has. In these
cases, of course, the certainty of the conclusions must be adjusted
accordingly. In particular, some methods (such as probabilistic
marking) provide strong evidence that a link was traversed but very
weak evidence that it was not. Others (such as SPIE) provide strong
evidence that a link was not traversed and slightly weaker evidence
that it was. Also, some methods provide evidence for only some
packets, and we have to know which packets those are. For instance,
remote sensors may be looking for only certain types of packets.
The most interesting case is route based filtering itself, which
tells us that certain packets could not have traversed certain links.
Note: Actually it tells us that those that did traverse
those links were dropped immediately after they did, so the fact
that a packet arrived at some later place shows it did not traverse
that link.
However, in order to know which packets could not have traversed which
links we need the same data that the router uses to filter.
Despite the complications, we now see that there is a very straight forward way to combine the evidence from all of the methods we have described, both link identifying and filtering methods. There may still be other ways of combining lower level conclusions of different methods, i.e., data used to justify the conclusion that a link was or was not traversed. We have not attempted to identify these lower level conclusions or how they might be combined.
In a private network it may be possible to obtain complete routing data. In the current internet this is apparently not possible at present. There are sources of partial routing data in the internet. What data can be obtained, how it can be combined and how well the result can be used to determine the set of possible origins of packets traversing a given link are all subjects of active research. (*** citations? Kihong Park, Rajesh Talpade, me, who else?)
Actual routing algorithms are much more constrained than the notion of
a set of paths would suggest. Each router makes a local decision
based on the content of the packet (generally only the destination
address). That is, the router normally doesn't even know where the
packet has been before so the next hop it chooses cannot depend on
that. Given two links, l1 and l2, with the property that some path,
p, traverses first l1 and later l2 on the way to destination d, we
will say that l1 is "upstream" of l2 (or l2 is "downstream" of l1).
Note: There are actually two different conditions of
interest, which turn out to be the same in the common case where there
is only one next hop from a given router to a given destination. For
the rest of this paper we assume that is the case, but for
completeness we mention here that for positive links the interesting
property is the one above, meaning that the set of possible sources
for a packet traversing the link is the set of machines at the sending
side of the upstream links. For negative links, the corresponding
property is that ALL paths through l1 on the way to D later go through
l2. This means that if the packet did NOT traverse l2 then it could
not have traversed l1, or more to the point, it could not have
originated at any point upstream of l2 in this (negative) sense.
The fact that routers choose next hops independent of previous hops
means that the set of possible origins for the upstream link is a
subset of the set of possible origins for the downstream link. In
general, positive results from upstream links provide more information
than from downstream links, while negative results from downstream
links provide more information than negative results from upstream
links. Since the positive links must all be on a common path, all of
the positive information is contained in the most upstream positive
link. On the other hand, all of the negative information is contained
in the set of negative links that are farthest downstream (meaning
that no other negative link is downstream of them).
Note: Some link identifying methods identify multiple
links, or even complete paths. We see here that only the most
upstream link is important. Other links may be useful as consistency
checks. There are two main reasons that other links are even
collected. One is that the methods sometimes can't tell which links
are upstream of which. The other is that the methods were invented
for goals other than attribution in our sense. Mostly they are meant
as components in defense against flooding attacks, and attributing the
packets to ip addresses is not necessary for this task.
Figure 1 attempts to summarize the possible situations. The boxes represent routers and the lines connecting them represent links. In order to remove any doubts about routing the sample graph shows no loops. D is the destination of the packet in question. An arc is meant to represent the subset of the IP addresses or machines upstream of the link crossed by the arc. Plus signs represent positive links (known to have been traversed by the packet), while minus signs represent negative links (known not to have been traversed). Cooperating routers either justify a plus sign on one link other than the one leading toward D (the one where the packet arrived) or, by failing to do that, justify a minus sign on the link from the router toward D. In this case router A has not seen the packet so we put a minus sign on link D-A. Of course, this would also justify minus signs on all the other links connected to A, but we can omit these since they add no more information. Router C, on the other hand, reports that the packet arrived on link C-E which justifies a plus sign on that link. This similarly implies a plus sign on link D-C but again that adds no information. It is also possible for a machine other than a router to monitor a set of links, which might justify such marks as the plus sign on E-F and the minus sign on G-H.
As mentioned earlier, all of the plus signs must be on a common path, specifically on the path from the true source to D. If there are any positive links we can ignore all but the one farthest upstream, in this case link E-F. Also if there are any positive links then we can ignore any negative links other than those upstream of the farthest upstream positive link. In this case the minus sign on link D-A is not upstream of E-F so it adds no new information. We also mentioned earlier that we can ignore any minus sign that is upstream of another minus sign. In this case, if there were a minus sign on link F-G then we could ignore the minus sign on link G-H. The set of possible origins is then the set of machines upstream of the farthest upstream positive link minus (that is, set difference) the set of machines upstream of the remaining negative links. In this example there is only one negative link upstream of the most upstream positive link, so the result is the set of machines upstream of E-F minus the set of those upstream of G-H.
If there are no positive links then we must use the entire address
space as the positive set and subtract from that the addresses
upstream of the negative links. That case can be visualized by
erasing the two plus signs. In that case the answer is the entire
address space minus the addresses upstream of D-A and G-H. This
illustrates a problem with negative links. They typically provide
much less information than positive links. In order to provide the
same amount of information as one positive link it would take negative
links that cover the entire address space outside that positive link.
In particular, negative links far upstream provide very little
information. Ingress filtering is a case in point. An ingress filter
amounts to a minus sign at the outside of the diagram (very far
upstream). The common complaint is that ingress filtering does little
good unless it's deployed universally. For instance, if half the
networks in the internet did ingress filtering, this would allow you
to attribute an arbitrary packet to either the network containing its
source address or the half of the networks that don't filter. That
seems a very weak result considering that half of the networks in the
internet are cooperating.
Note: Despite the complaints, ingress filtering is
currently the most widely deployed technology relevant to level 1
attribution. Even at low levels of deployment it's worth using
where available. The greatest cost in doing so may be simply
finding out where it is deployed. If others start to deploy more
general filtering technology, there will also be a cost in
determining what exactly is being filtered.
We now present a variety of what we hope are instructive examples.
When a packet arrives the cooperating machines report that the packet traversed certain links and did not traverse certain others. This, along with detailed routing information (obtained we do not yet know how) implies that it must have originated from one of the places in a set that is computed.