EECS 325/425
Project 2
Due December 2, 2019 before 11:59pm.
EECS325: 80 points + 10 bonus points; EECS425: 100 points
In this project, you will investigate the correlation between two common inter-host distance metrics: the hop count and the RTT. Before you do your measurements, you need to implement a program that measures hop distance and RTT to a remote host. As you know, the traceroute tool gives you not just the number of hops between the two hosts, but also the IP addresses of all the hops and the RTT between the sender and each hop. This is more than what we require, and therefore traceroute is too heavy weight for our purpose: traceroute sends out a lot of probes, while your tool should need many fewer probes. In fact, your tool should be able to measure the number of hops between the sender and destination using only one probe. To do this, you can exploit the fact that ICMP error messages include an initial portion of the datagram that triggered the error. So, if you send out a datagram to a destination to an unreachable UDP port and set the TTL x to this datagram, the residual TTL y that the datagram has when it reaches the destination will be returned to you in the ICMP payload. Then you know that the difference (x-y) represents the number of hops between the sender and the destination.
To implement your hops and RTT measurement tool, you need so-called raw sockets as they allow you to control the values of the IP header fields (which as you know from Project 1 are filled out by the kernel for you if you use regular UDP or TCP sockets) as well as receive ICMP messages. You can learn about raw sockets in detail at http://sock -raw.org/papers/sock_ raw but necessary simple details are provided below. Note: When we send measurement traffic to unsuspecting sites, we try to mark our traffic as explicitly as we can to distinguish it from attack traffic. So, in your payload, please include text measurement for class project.
Internet measurement experiments are often done at large scale (unlike our toy experiments with only 10 destinations) and take a very long time to complete. Therefore, we often try to combine measurements, think ahead, and extract as much information from a single experiment as we can. In this experiment, for the 425 students, I would like you to combine the above measurement with answering another question: while ICMP stipulates that the ICMP error messages include only 28 initial bytes of the datagram that triggered the error, rumor has it that in practice these error messages include much larger part of the datagram. I would like you to construct your probes to include some payload to make the probe datagrams have the maximum length of 1500 bytes (or 1472 of payload plus 8 bytes of UDP header plus 20 bytes of IP header). Then when you receive the ICMP response, see how much of the original datagram the response contains. Produce the result in your report as a table. In other words, pad the payload in your probing datagram (which, again, identifies your traffic) with, e.g., character a so that the datagram reaches the maximum length above.
Thus, the output of your tool must include, for each destination, (a) the number of router hops between you and the destination, (b) the RTT between you and the destination, (c) the number of probe/response matching criteria that matched for this destination (see explanation below) and for the 425 students (d) the number of bytes of the original datagram included in the ICMP error message.
Notes:
You will need to use python to implement this program. If you do not know it, just find some intro tutorial with lots of examples. You only need rudimentary understanding of the language, just enough to implement the task at hand. Note: you can find on the Internet
code that implements somewhat similar functions, such as ping and traceroute. You are welcome to read this code or even reuse parts of it as long as you document carefully what code is yours and what you adopted from elsewhere. Your machine has python 3 installed; you need to access it using python3 command (instead of usual python command, which does not exist).
You will need to have root privilege to work with raw sockets. You each have received your own (virtual) machine at your disposal (sent to you individually). Extremely important: do
NOT change the passwords associated with your VM. Doing so will invalidate your credentials and you will lose access! There is no IDE on this machine you would have to use an old-fashioned text editor and command-line terminal window to write and debug your code. You can of course try to develop/debug on your own machine but it will work well only if your machine runs linux: I am told Windows support for raw sockets is spotty so your mileage may vary. Rami (my former PhD student) tried his implementation of this project on Windows 10 and it worked fine. Note: if you do use your own machine, I believe CaseGuest wi-fi blocks ICMP messages, so make sure you are on Case Ethernet or Case Wireless network, or log into the VPN.
You will need to create a datagram with custom values of some header fields for your probe. The easiest way to do it is to create a socket of type socket.SOCK_DGRAM and then use setsockopt() to change just the fields of your socket that you need to change. Then you can send it with socket.sendto() without going through the trouble of building the rest of the headers yourself.
You will also need to create a raw socket to receive ICMP messages. To create a raw socket for receiving an ICMP datagram, you can use the following function:
recv_sock = socket.socket(socket.AF_INET, socket.SOCK_RAW, socket.IPPROTO_ICMP)
after which you can get a packet from this socket by using the following:
icmp_packet = recv_sock.recv(max_length_of_expected_packet)
Note if you try to debug this on your local machine that runs Windows (see note about Windows support for raw sockets): Windows apparently requires you to bind the socket to a dummy port and local IP address after you create it and before reading, by using the command below. I put bind in quotes because ICMP has no notion of a port so it makes no sense to bind this socket to a port, but you should just view this as an idiom in Windows (and this command does not seem to do any harm in Linux either, so you could leave it in place when you move your tool to your remote machine):
recv.socket.bind(,0);
A few other useful functions:
To extract a field from a packet, you can use struct module, in particular struct.unpack function. Keep in mind that whenever you extract a multi-byte integer (e.g., port number) you need to make make sure the order in which the bytes appear in the packet (the most significant byte first) matches the order in which these integers are represented in the computer you happen to use. A portable way to handle this is to use so-called network order when extracting the bytes from the packet. E.g., if packet[x:x+1] represents the two-bytes port number in a packet, your can convert it to int by the following:
port_from_packet = struct.unpack(!H, packet[x:x+2])[0]
Here, the first argument !H speficies format of the packet fragment being extracted, in this case signifying that the two bytes from packet appear in the network order (!) and they represent an unsigned short (H).
However, when you need to extract only a single byte, you can just convert it to integer using ord(byte) function.
Code to include disclaimer in the probe seems to have to be a bit different in python3 from what would have been in regular python. This snippet seems to work:
msg = measurement for class project. questions to student [email protected] or professor [email protected]
payload = bytes(msg + a*(1472 len(msg)),ascii)
send_sock.sendto(payload, (dest_ip, dest_port))
You will need to think how you will match ICMP responses with the probes you are sending out. Note that your socket may receive unrelated packets since there is no port number to distinguish your packets from someone elses (i.e., another process running on your host). In principle, there are several possibilities: one technique would compare the IP address of your measurement target to the source IP address of the ICMP response and match probes with ICMP responses when these IP addresses match. This technique is going to work most of the time but not all the time: sometime you may receive the ICMP response from an IP address that is different from the address you are probing. (We have actually seen this behavior. I speculate its due to the use of a load balancer at the destination. In this setup, when you resolve the destinations hostname, you get the IP address of the load balancer, which receives your packets and forwards them to one of the servers in a server cluster. However, the ICMP messages that the selected server generates, carry the servers own IP address, which is different from that of the load balance which your probe used.) Another technique matches the datagram ID (the IPID field from the IP header) with the datagram ID you find in the ICMP payload (which, as a reminder, contains full IP header plus 8 bytes of the transport-level header of the original datagram in those first 28 bytes of payload). Again, this will work most of the time but does not guarantee to work all the time because a NAT box, a firewall, or another middlebox between you and the destination may change the IPID of the datagram as it forwards it along. Yet another technique involves matching the destination port number from your probe datagram with the corresponding information from the 28-byte ICMP payload. This is unlikely to be changed by any middlebox. Thus, I would like you to implement all three techniques above and declare a match when any of the three techniques produce a match. For each destination, please report how many of the three techniques above produced a match.
You need to allow for a possibility that you will get no answer because the target host does not send ICMP error messages, or because the destinations firewall blocks outgoing ICMP messages, or because the destinations firewall blocks unexpected UDP messages. Hence your program should not be stuck on reading from a socket forever. You will need to use either a select or poll call on the socket (read about them google for socket select) before reading from the socket. WARNING: do not process this error by changing the destination port number stick with 33434 or you may get a nasty call from network admins! This will be interpreted as port scanning. I suggest you just try couple times and if you still get no answer, give up on this destination and produce some error message and move on to measuring the next site. You can check manually if it is your programs fault by trying to reach this destination using a standard ping measurement (there is a small chance that ping would be treated differently and experience a different outcome, but if you see that
your tool produces no answer while ping succeeds on many destinations, this should increase your doubt in the correctness of your tool).
While in theory you can address your probe to any unreachable port, how would you know which port on the target host is unused by other applications? There is a specially allocated port number for this type of probing by traceroute, and please use that port number, which is 33434.
Your script should read the file targets.txt (collocated in the same directory as your script), which includes the set of websites you are exploring and accept no arguments, and obtain the IP address of each target using socket.gethostbyname method. The script should print out the result on standard output (in addition to any file that you find convenient to produce the plot below) that is understandable to the grader. Please name the script distMeasurement.py.
Once you have your tool, measure the hop count as well as RTT to each of the websites you worked with in earlier homework assignments (or read instructions for the eecs425 portion of HW 2 if you are a 325 student). From the output of your tool, produce a scatter graph to visualize the correlation (use any tool you want for this, e.g., Excel): have hops on X axis, RTT on Y- axis, and for each remote host, place a dot with corresponding coordinates on the graph. This is a typical technique to visualize correlation. Note, as mentioned, you may not get responses from some of the servers. In order to produce a meaningful scatterplot, pick some other sites that were not included in your original list of ten. Keep probing until you have around ten. You can say in your report that you are substituting these sites because your original sites did not respond.
Deliverables:
Submit to canvas: A single zip file with (a) all programs (make sure they are well commented and include instructions how to run them arguments etc. Under-documented programs will be penalized.); (b) Project report that includes all measurement results, graphs, correlation coefficients, and conclusions that you draw from your measurements.
Place your project into directory
Tips:
The machines may take a long time to execute your sudo commands. To save time, the best thing to do is to start by executing a sudo bash command, which will get you a shall prompt as a root. Then you can just run any commands you want there as a root.
To debug your tool, you might find it useful to capture packets that your machine sends out and receives back. You can use tcpdump from the command line on your VM to capture the trace, then copy it to your local machine, and then use wireshark on your machine to read the trace from the file. Alternatively, here is a tip from Yu Mi (a networking PhD student here) on how to run wireshark on your remote Virtual Machine with the wireshark GUI on your own computers screen (I did not try this):
If you are using Windows:
Download and install Xming server and Xming font package.
Launch Xming server using default setting.
Start puTTY. Under /Connection/SSH/X11, check box Enable X11 forwarding. Then fill the host IP address (in this case, your VMs IP) at the session page and open connection.
At this point, we should be able to launch wireshark using sudo wireshark-gtk & from VMs command line.
If you are using Mac:
If your Mac has not yet installed XQuartz, please download and install.
Login to the VM with ssh -X -Y eecs-user@
In the VM console, execute sudo wireshark &.
NOTE: Here I assume the Mac users are using the default settings of ssh. If you made any modifications to the ssh configuration, please restore the changes around X11 forwarding.
Currently known problems (quoting Yu Mi):
The wireshark does not support reconfiguration to allow the user to launch wireshark with root authority as default, or at least I havent found a way to reconfigure it. If we do not launch it with sudo command, it will not return any network interface. But if we launch it with sudo, it will show some warning. This does not seem to cause any problems so just ignore.
Grading rubrics:
Code for measuring hop distance and RTT to a given destination: 40 points;
Code for reading the set of destinations from a file and obtaining the above metrics to all the destinations: 20 points;
(EECS425 only, 10 bonus points for 325 students): Code for obtaining the number of initial
bytes from the original datagram included into the ICMP error message: 20 points;
The lab report including the tool output data, the graphs, and the conclusions: 20 points;
Reviews
There are no reviews yet.