CS241 Coursework 20192020 The Task
For this coursework you will implement a basic intrusion detection system. This will test your understanding of TCPIP protocols networks and threading OS as well as your ability to develop a nontrivial program in C. The coursework contributes 20 of the total marks towards the module.
You have been provided with an application skeleton that is able to intercept sniff incoming packets on a specific interface and print them to the screen. The code uses the libpcap library to receive packets and strips the outermost layer of the packet. The goal of this coursework is to extend the skeleton to detect potentially malicious traffic in highthroughput networks. The key deliverables of this coursework and their associated weightings are as follows.
Extend the skeleton to efficiently intercept and correctly parse the IP and TCP protocol layers. 20
Produce a report containing a breakdown of the malicious activity detected to be printed when the program exits. 25
relative weights: SYN attack detection 50, ARP poisoning attack detection 25, Blacklisted URL detection 25
Implement a threading strategy to allow your code to deal with high packet throughput. 25
Write a report no more than 1000 words in length excluding references explaining the design, implementation and testing of your solution. 20
The final 10 is awarded for code quality and adherence to relevant software engineering principles.
You must base your solution on the skeleton provided and it must be written entirely in the C programming language. You should only consider IPV4there are no additional marks available for IPV6 functionality. You may choose to use appropriate academic or industrial literature, which should be referenced appropriately. When writing an academic report, you should not write in first person i.e., Dont write I did this, I did that, etc..
Developing your code
Your codes need to work for a Linux environment. You can choose one of the following two options to develop your code Option 1 being your primary option
Option 1: You can develop your code in a virtual machine VM environment in a DCS machine. The VM is necessary since libpcap requires root permissions to access the network interfaces. To use the VM environment, you can type the following in the terminal on a DCS lab machine please do not use joshua :
This will create a copy on write COW file to store any changes you make to the virtual machine and then boot the system.
It will also give you the details including a password for logging into a VNC session and a port number for connecting to the VM via SSH. The output may look something like
SSH server will start on port portno, to connect use:
ssh p portno rootlocalhost
VNC server will start on port N, to connect usepassword sessionpassw
ord
vncviewer localhost:N
You need to log in using vncviewer to connect to the VNC Session. The following command will open a connection dialoguemake sure the details are correct then press connect and enter the session password.
Once you are connected you will be presented with a login prompt. You should log in to the root account with no password.
If you prefer to log into the VM via SSH you will need to set a root password for the VM. You can do this by first logging in through vncviewer as described above and then issuing the passwd command. Once you have done this, you can connect to the virtual machine via SSH on the port shown in the output of the coursescs241courseworkmulticommand. Enter the following command with portno replaced by the actual port number and then use the root password for the VM which you have just setup.
While testing your sniffer, you will often need to run it and some other commands simultaneously. This can be achieved either by starting multiple ssh sessions or using a terminal multiplexing program. GNU screen and tmux are two popular choices. If you wish to edit your code on the DCS machines before compiling it on the virtual machine you can mount your DCS home space in the VM with sshfs:
bash4.1 vncviewer
bash4.2 ssh p portno rootlocalhost
bash4.1 sshfs usernamelogin2.dcs.warwick.ac.uk: local
bash4.1 coursescs241courseworkmulti
Creating COW file for courseworkFormatting
After this your DCS user area home directory will be available inside the local directory. To shutdown your VM after use you can use the halt command.
The full list of utilities which have been preinstalled for your convenience is as follows:
vivim
nano
screen
gdb
sshfs
python
hping3
If you would like to use any other applications to help you with this coursework, you are welcome to install them. Many common Debian applications can be installed by using aptget as follows where tmux should be replaced by the application to be installed:
Please note your solution should not require the installation of any additional packages or libraries to function.
If you need to recreate your virtual machine for whatever reason, you can do this by deleting your cow file as follows:
Option 2: To develop the codes on your own computerlaptop you need a Linux environment which can be created by setting up a virtual machine within your own computer. Download the compressed file containing the virtual disk image from here and follow the instructions given in the this pdf to setup your virtual machine. If you are using this option, you may need root permissions to execute many of the commands given below in which case you can use
with command replaced by the command you want to execute. Use the root password described in the pdf document.
Important: Your code must compile and run on any one of the two environments described above. Marks will be lost if this is not the case.
Code Skeleton
bash4.1 aptget install tmux
bash4.1 rm rf cs241qemu
sudo command
The coursework skeleton is available here and consists of several files, each with a specific purpose:
Makefile
As this project spans multiple files we have provided a makefile to automate the build process for you. To compile your solution you should change into the src directory and then run the make command. This will build the application binary ..buildidsniff. Your solution should not require changes to this file. To run the skleton after building the binary file, execute the following command assuming that your current directory is src:
Replace interface with the name of the interface on which you wish to capture packets. If no interface name is specified, the program will assume the default name eth0 for the interface. Note that if you are using Option 2 above, the name of the external interface is enp0s3 . You can use the command ifconfig to see the details of the network interfaces used by a machine
main.c
This file hosts the application entry point. It also contains logic to parse command line arguments, allowing you to set the verbose flag and specify the network interface to sniff. Your solution should not require changes to this file.
sniff.c
This file contains the sniff function which captures packets from the network interface and passes them to your logic. A utility method called dump is also provided to output the raw packet data when debug mode is enabledv 1 . You should study this function carefully as it demonstrates how to parse a packets ethernet header. You may need to add some of your own codes to this file.
analysis.c
This file is where you should put code to analyse packets and identify threats. Your logic should be
called from the analyse method which runs each time a packet is intercepted. dispatch.c
This file is where you should put code to parallelise your system. At the moment, the dispatch method simply calls the analyse method in analysis.c . This sequential, singlethreaded behaviour should be replaced by code to distribute work over multiple threads.
Specification
The project is split into three parts and deals with a specific concept. Please note that these parts are not discrete exercises. You are not expected to complete them in order, and they are not separate courseworks. You should read all three parts before you start the coursework. In particular, when writing code to parse packet headers required for part 1 you should be mindful of thread safety.
..buildidsniff i interface
Part 1Packet Sniffing
You should start this coursework by writing code to parse the Ethernet, TCP, ARP and IP headers
in analysis.c . This code will be used in Part 2 of this coursework by your packet analysis routines. Before you begin you should review the network primer pagethis covers the Internet protocol stack and the packet structure which you will be expected to parse.
Hint: If you get stuck with parsing headers you should read the dump method in sniff.c as it demonstrates how to parse the Ethernet header link layer in the OSI model. To understand how this method works remember that the ethernet frame is 14 bytes in size and has the following format note one tick mark represents one bit, meaning there are 32 bits4 bytes per complete line :
0123
01234567890123456789012345678901 Destination MACDestination MAC Cont.Source MACSource MAC Cont.Protocol
Due to Cs contiguous memory layout guarantees, the fields of a struct can be arranged to match the fields within a captured packet. We can therefore define a struct which maps to this format, or use the one provided in netinetifether.h:
ifether.h excerpt
define ETHALEN 6Octets bytes in one ethernet addr
define ETHHLEN 14Total octets in header
struct etherheader
uchar etherdhost6;
uchar ethershost6;
ushort ethertype;
;
Hopefully you should be able to see how the struct maps to the ethernet format6 bytes for the destination address, 6 for the source address and 2 bytes for the protocol.If you look up this documentation online, make sure that you refer to the version that applies to your operating system.
Once we have this struct defined we can use it to read values directly from the packet data:
struct etherheaderethheaderstruct etherheaderdata;
printfnType: hun, ethheaderethertype;
The code above shows you how to parse the outermost layer of the packet and access members of the ethernet header. However, while this will allow you to parse each field into a C structure there is one additional complication to considernetwork byte order. In order for different machines to communicate, a standard ordering of bytes for multibyte data types e.g short and int must be observed. This is because some machines place the most significant byte first bigendian and others place the least significant byte first littleendian. With this in mind, when multibyte values are read from a socket they must be converted from network byte order, to the order which the current machines uses. Since ethertype is a multibyte value a short, it will need to be converted before being printed or used in any comparisons. To do this, the function ntohs can be used as follows.
For part 2 you will need to repeat this process to access information added at the various network layers See TCPIP Networking Model for a brief introduction, and here for a full overview.
To score highly in this part your solution must contain code which can successfully parse the relevant packet headers. We do not expect you to parse every type of header imaginable, only those which will be required to complete part 2. You should also take a look at the sniff method in sniff.c to see how packets are captured and passed on to your logic. Hint: The current implementation works but it could be improved by using libpcaps own network loop.
A note on PCAP filters
PCAP exposes a domain specific language which allows you to specify packet filters. This feature may not be used to complete the coursework. You must implement the logic to access and process packet headers manually. A central aim of this coursework is for you to become familiar with the network stack; marks will be deducted from solutions which rely on external parsing or filtering logic.
Part 2Intrusion Detection
Now that you are able to parse individual packets the next step is to analyze them. You should write code to detect the three suspicious scenarios outlined below. Your program should record any malicious activity and print a report on exit see this link for help. This report should show a clear breakdown of any malicious activity detected. Your output should look something like the following when the code is killed using Cntrlc .
include netinetin.h
unsigned short ethernettypentohsethheaderethertype;
SYN Flooding Attack
This attack is achieved when a server listening on a TCP socket is flooded with TCP SYN packets packets whose SYN bit is set to 1 and all other flags set to 0. For each received SYN packet, the server opens a TCP connection, allocates some resources, replies with a SYNACK packet and then waits for another ACK packet from the sender. The malicious sender does not reply back. This creates a lot of halfopen TCP connections at the server each of which occupies some resource. As the server gets clogged, it slows down and legitimate connection requests are dropped. This is a form of denial ofservice attack. In most cases the attacker floods the server with SYN packets from spoofed IP addresses that are randomly generated and do not actually correspond to the attackers own IP address which hide the attackers own identity.
Your job is to detect the possibility of a SYN flooding attack. We will say that a SYN flooding attack is detected if the following conditions are satisfied
1. At least 90 of the SYN packets sniffed are generated from unique source IP addresses all different.
2. The rate at which SYN packets from unique source IP addresses are received is more than 100 packets per second. More precisely, the rate is defined as the ratio between the number of SYN packets received from unique source IP addresses to the time between the first and last SYN packets received.
If either of the conditions is violated we output No SYN flooding attack detected. If both conditions are satisfied, the output should state the detailed report like the one shown above.
Hint: To implement the above, you may need to use arrays to store the source IP addresses and arrival times of incoming SYN packets so that they can be processed at the end. Since the number of SYN packets received is not known a priori, these arrays need to be growing dynamically. You can use the function gettimeofday to obtain the arrival times of SYN packets. You can use struct tcphdr defined in netinettcp.h to parse the TCP header and struct iphdr defined in netinetip.h to parse the IP header.
lca2lca2:DesktopshareLabsProjectsrc sudo ..buildidsniff i enp0s3
..buildidsniff invoked. Settings:
Interface: enp0s3
Verbose: 0
SUCCESS! Opened enp0s3 for capture
C
Intrusion Detection Report:
SYN flood attack possible
3204 SYN packets detected from 3204 IP addresses in 0.038504 seconds
4 ARP responses cache poisoning
5 URL Blacklist violations
Testing your code: To test your code, you can generate SYN flooding attack on your loopback lo interface using hping3 . If not installed already, install the hping3 package by typing aptget install hping3 . Then you can issue the following command
This will send 1000 packets c 1000 of a size of 120 bytes d 120 each with the SYN Flag S enabled, with a TCP window size of 64 w 64 to port 80 p 80 of localhost at an interval of 100 microseconds between two consecutive packets i u100. The source IP addresses are randomly generated randsource. In another terminal window, you should be sniffing on the loopback interface with the following command
ARP Cache Poisioning
The Address Resolution Protocol ARP is used by systems to construct a mapping between network layer Media Access Control and link layer Internet Protocol addresses. Consider a simple scenario: two systems share a networkdcslaptop has IP address 192.168.1.68 and is trying to communicate with broadbandrouter at 192.168.1.1. To achieve this, dcslaptop broadcasts an ARP request asking for the MAC address of the node at 192.168.1.1. When broadbandrouter sees this message it responds with its MAC address. dcslaptop will cache this address for future use and then use it to establish a connection.
The ARP protocol has a serious flaw in that it performs no validation. An attacker can craft a malicious ARP packet which tricks the router into associating the ip address of dcslaptop with the attackers own MAC address. This means all traffic bound for dcslaptop will be redirected to the attacker, potentially exposing sensitive data or allowing for maninthemiddle attacks. To make matters worse, ARP allows unsolicited responses, meaning dcslaptop does not even have to send out a requestan attacker can simply broadcast a message informing all nodes to send dcslaptop traffic to their machine.
Although ARP messages can be legitimate, the use of caching means they should be very rare. A burst of unsolicited ARP responses is a strong indication that an attacker has penetrated a network and is trying to take it over. You should add code which detects any ARP responses.
Hint: there is an etherarp struct defined in netinetifether.h
Testing your code: You have been provided with a python script which you may find useful when testing your code. The arppoison.py script can be found in the test directory. If you intend to run this script in the virtual machine you will need to install the scapy package by running aptget install pythonscapy in the terminal. The sniffer should once again be configured to listen on the loopback interface i lo. The script may issue warnings about missing librariesyou can safely ignore these. As long as you see 1 packet sent the script is functioning correctly.
hping3 c 1000 d 120 S w 64 p 80 i u100 randsource localhost
..buildidsniff i lo
Blacklisted URLs
Intrusion detection systems typically watch traffic originating from the network they protect in addition to attacks coming from outside. This can allow them to detect the presence of a virus trying to connect back to a control server for example, or perhaps monitor any attempts to smuggle sensitive information to the outside world. For this excercise we have identified
www.telegraph.co.uk as a suspicious domain which we wish to monitor. Specifically we wish to be alerted when we see HTTP traffic being sent to that domain.
You should add code to process TCP packets that are sent and recieved from port 80 i.e. the HTTP port. This code should parse a subset of the HTTP applicationlayer headers in order to identify the host web address. If any requests to www.telegraph.co.uk are detected these should be flagged up as malicious.
A malicious HTTP request will look something like the following:
GET news HTTP1.1
Host: www.telegraph.co.uk
Connection: keepalive
Accept: texthtml,applicationxhtmlxml,applicationxml;q0.9,;q0.8
UserAgent: Mozilla5.0 X11; Linux x8664 AppleWebKit537.22 KHTML, like
Gecko Chrome25.0.1364.172 Safari537.22
AcceptEncoding: gzip,deflate,sdch
AcceptLanguage: enGB,enUS;q0.8,en;q0.6
AcceptCharset: ISO88591,utf8;q0.7,;q0.3
Testing your code: One way to test your code is to use the wget command on your virtual machine to retrieve a webpage like so: wget www.telegraph.co.uk . Whilst testing this code you should configure the sniffer to listen on the external eth0 or enp0s3 interface.
Part 3Multithreading
Intrusion detection systems often monitor the traffic between the global internet and large corporate or government networks. As such they typically have deal with massive traffic volumes. In order to allow your system to handle high data rates you should make your code multithreaded. There are several strategies you could choose to adopt to achieve this. Two common approaches are outlined below. Whatever approach you choose to implement you must remember to justify your decision in your report. For this work we will focus on POSIX threads you were introduced to in lab 3. In order to use POSIX threads, the lpthread linker flag must be added to the project makefile like so should be done already:
LDFLAGS : lpthread lpcap
One Thread per X Model
This approach to threading creates a new thread for each unit of work to be done in our case our X is each packet to process and is probably still the most common approach to threading. This model is sometimes called the Apache model after the Apache webserver which by default gives each client connection a dedicated thread. The strength of this model can be found in its simplicity and low overhead when dealing with constant light loads as no threads are kept idle. The downside to this approach is that it scales poorly under heavy or bursty load.
Threadpool Model
This approach creates a fixed number of threads on startup typically one or two per processor core. When a packet arrives it is added to a work queue. The threads then try to take work from this queue, blocking when it becomes empty. The strength of this approach is that it deals better in bursty or heavy traffic scenarios as it removes the need to create threads dynamically and limits the number of threads active at any given time, avoiding thrashing. Its weakness stems from the added implementation complexity.
The bulk of your threading code should be placed in dispatch.c . Whichever model you choose you should deal with thread creation and work allocation here. You may find that you also have to make minor modifications to analysis.c to make your code threadsafe. In particular you should be careful when storing intrusion records or dealing with dynamic arrays to avoid any lost updates or race conditions.
Submission
The deadline for submission is Thursday 25 November12pm Noon. Normal late submission penalties will apply. You are expected to submit your solution via Tabula.
Please submit a single zip file containing
The source files which make up your application please include all source code both .c and .h, even though some will be unchanged, so we can easily compile your code.
A PDF file called Report.pdf that details the design, implementation and testing of your solution.
If you have unsuccessfully attempted a mutlithreaded version, you may submit your nonworking version in a clearly labelled subdirectory of the zip file.
Useful References
RFC 791Internet Protocol
RFC 792Internet Control Message Protocol RFC 793Transmission Control Protocol IEEE 802.3Ethernet
RFC 2616Hypertext Transfer Protocol
SYN flooding attacks
NMAP Port Scanning Techniques Linux signals
Measuring time using gettimeofday Useful resource for PCAP
Module Tutors
Please contact the TAs of the module for any questions regarding the coursework.
Department of Computer Science, University of Warwick, CV4 7AL Email: compscidcs.warwick.ac.uk, Telephone: 44 024 7652 3193
Page contact: Arpan Mukhopadhyay
Last revised: Mon 21 Oct 2019
Powered by Sitebuilder MMXIXTermsPrivacyCookiesAccessibility
Reviews
There are no reviews yet.