○ how to build a machine-learning model based on normal network traffic.
○ how to conduct a blending attack producing artificial network traffic that resembles the normal one, and bypass the learned model.
We recommend that you use whatever is comfortable for you — either local set-up or the Linux VM provided. In the past, students faced no difficulty in setting up the project and working on either Windows or Macintosh OS.
This project relies on the following readings:
We have created a small quiz to help you understand the topics covered in this project. Please read the papers (under Readings and Resources) before attempting the quiz and the subsequent tasks.
For each question, please enter your option in answers.txt as shown in the sample file below.
Submit your answers for this part in answers.txt.
You can find answers.txt under the project directory.
NOTE: You may not see the marks you obtained for the quiz (and hence, the total marks of the project) to avoid students from brute-forcing the answers.
We hide your total grade intentionally!!!
You can either use the provided VM to complete the project OR you can set up your own environment locally.
Desktop -> project 5
HOWEVER: please use project5.zip in Canvas for the most updated version.
TIP: Even if you are using the provided VM, please check SETUP.txt to understand how the project is set up and to get an overview of the various code components in the project. This might help in debugging any issues you might face later.
The 2 videos on the Project 5 page are the previously recorded TA office hours. Our student found them extremely helpful. If you get stuck at any of the tasks below, we highly recommend that you watch the following videos first before you post your questions on Ed.
Please refer to the reference readings to learn about how PAYL model works, in particular,
3.1.2: Code and data provided
The PAYL directory provides the PAYL code and data for model training.
Here is the workflow of the provided PAYL code:
○ Read the normal traffic data and divide it into two parts, 75% of the data for training and the rest 25% for testing (NOTE: You will NOT change these portions in the code).
○ Sort the payload strings by length and generate a model for each length.
○ Each model per length is based on [mean frequency of each ascii, standard deviation of frequencies for each ascii].
To run PAYL in training mode
$ python3 wrapper.py
$ python3 wrapper.py [FILE.pcap]
where FILE.pcap is the data you will test.
The figure shows a sample output from the wrapper.py. You will find mSF and mTMD values which make mTP>96% for both HTTP and DNS protocols respectively. The parameters can be different for the two protocols.
Please report for each protocol that you used, the parameters that you found (output by wrapper.py) in a file named parameters.txt. Please report a decimal with rounded 2-digit accuracy for each parameter.
NOTE: You are given a sample parameters.txt with dummy values in the PAYL directory. Please update the relevant values with your own answer. Check section 4 for more details.
NOTE: The value for “DISTANCE” in parameters.txt will be obtained in the next task (section 3.2).
Download your unique attack payload [YOUR_GTUSERNAME.pcap] from Files in Canvas, and place it in the PAYL directory of the Project 5 folder.
For this part, you will verify that the model is effective for detection. You will ensure that the attack data does not fit the model, while the normal network traffic fits. Make sure you have completed Task A and that the parameters.txt file is set. Completing Task A and Task B will be essential to demonstrating the polymorphic blending attack in Task C.
http_smoothing_factor = None dns_smoothing_factor = None http_threshold_for_mahalanobis = None
dns_threshold_for_mahalanobis = None
Note: You will only see these variable names if you download the ZIP files from Canvas. On the virtual machine (VM), the variables are named differently.
$ python wrapper.py <YOUR GT ID>.pcap
Note: We use python3 and the command will either be python or python3 depends on your local env if you download python locally on your own machine.
(set the training_protocol parameter in wrapper.py) $ python3 wrapper.py http_artificial_profile.pcap
(set the training_protocol parameter in wrapper.py)
$ python3 wrapper.py dns_artificial_profile.pcap
Please report the calculated distance of (where mDISTANCE is in the above figures) in parameters.txt for each protocol with the values of the attack payload (YOUR_GTUSERNAME.pcap) after completing Task B.
We assume that the attacker has a specific payload (the attack payload) that they would like to blend in with the normal traffic. Also, we assume that the attacker has access to one packet (artificial profile payload) that is normal and is accepted as normal by the PAYL model.
Preliminary reading: Please refer to the “Polymorphic Blending Attacks” paper. In particular, section 4.2 describes how to evade 1-gram and the model implementation. More specifically we are focusing on the case where m <= n and the substitution is ONE-TO-MANY.
The attacker’s goal is to transform the byte frequency of the attack traffic so that it matches the byte frequency of the normal traffic, and thus bypass the PAYL model. NOTE: Complete this task ONLY for the HTTP protocol.
Code provided: Please look at the Polymorphic_blend directory. All files (including attack payload) for this task should be in this directory. Hence, copy your unique attack payload also in this directory. Rename ATTACKBODY_PATH in task1.py with your unique attack payload name (YOUR_GTUSERNAME.pcap).
$ python3 task1.py
NOTE: You need to complete Task C before running task1.py.
Main function task1.py contains all the functions that are called to transform the byte frequency of the attack traffic.
B).
○ shellcode.bin (provided)
○ Encrypted attack body
○ XOR table
○ Padding
For detailed definitions of these components, please refer to the “Polymorphic Blending Attacks” paper.
We provide the skeleton for the code needed to generate a substitution table, based on the byte frequency of attack payload and artificial profile payload. For the purpose of implementation, the substitution table can be e.g. a python dictionary table. We ask that you complete the code for the substitution function. The substitution is one-to-many. Skeleton code prints the substitution table to the console. You will deliver your substitution table in substitution_table.txt file in the following format.
NOTE: This is just an example showing the format of the table. Please ignore the frequency values.
NOTE: The substitution table should have the frequencies as observed in the normal payload. Please do NOT normalize these values in substitution_table.txt. You can normalize the values later during substitution in substitution.py.
Similarly, we have provided a skeleton for the padding function and we are asking you to complete the rest.
Please complete the code for the substitution.py and padding.py and then run task1.py to generate the new payload (output).
Test your new payload (output) against the PAYL model and verify that it is accepted. FP should be 100% indicating that the payload got accepted as legit, even though it is malicious. You should run as follows and observe the following output, and get the output message that says, “It fits the model”.
TIP: Check the relevant FAQs in section 5.
IMPORTANT: Please check section 5.6 to understand how you can verify your code.
Tasks Deliverable Files
C substitution.py
padding.py substitution_table.txt output
Total: 6 files
Total points: 100
Please make sure to submit all of the files to Gradescope. Do NOT ZIP your deliverable files.
4.1: Project Quiz – 5 points
Please report (for each protocol) the parameters that you found in a file named parameters.txt. Please report a decimal with 2 digit accuracy for each parameter.
Please report your calculated distance (mDISTANCE in the above figures) in parameters.txt for each protocol with the values of the attack payload after completing Task B.
parameters.txt format:
|Protocol:HTTP|
|Threshold:1111.00|
|SmoothingFactor:0.01|
|TruePositiveRate:50.00|
|Distance:2000.00|
|Protocol:DNS|
|Threshold:2222.00|
|SmoothingFactor:0.00|
|TruePositiveRate:50.00|
|Distance:2000.00|
NOTE: You are given a sample parameters.txt with dummy values under PAYL directory. Please update each value with your own answer. Those values should only come from the PAYL script’s output to the console. (not from the values modified inside the script).
|Protocol:HTTP|
|Threshold:1111.00| // Part A
|SmoothingFactor 0.01| // Part A
|TruePositiveRate:50.00| // Part A
|Distance:20020.00| // Part B, mDISTANCE this is the mDISTANCE value that you get from your unique pcap file (python wrapper.py <yourunique.pcap>)
|Protocol:DNS|
|Threshold:2222.00| // Part A
|SmoothingFactor:0.00| // Part A
|TruePositiveRate:50.00| // Part A
|Distance:22000.00| // Part B, mDISTANCE this is the mDISTANCE value that you get from your unique pcap file (python wrapper.py <yourunique.pcap>)
NOTE: “0.3” should be entered as “0.30”. “2” should be entered as “2.00”.
Please submit the following files: substitution.py,and padding.py, and your substitution_table.txt, and the output of Task C (generated as a new file after running task1.py). This output is very important and represents a significant portion of the points for Task C.
We grant partial credits for the following conditions:
4.5: !! Important Notes (Please check before submission) !!
5.1: Task C clarifications
5.2: How to implement Substitution Table & Substitute?
First Refer to the “Polymorphic Blending Attacks” paper. In particular, section 4.2 describes how to evade 1-gram and the model implementation. More specifically we are focusing on the case where m <= n and the substitution is ONE-TO-MANY.
NOTE: We will not accept the implementation of ONE-ONE mapping.
Refer to the example provided in the write-up (section 5.5).
After reading the paper and example, it should be obvious how to implement a substitution table. If you still have any specific questions you can post your questions on Ed discussion.
Given an attack byte, find the mapping in your Substitution Table. You will have multiple choices because of how we constructed the table. Pick one based on the ratio of the bytes frequency to the sum of all frequencies. You have to normalize the frequencies to sum up to 1.
NOTE: You are allowed to import and use “random or numpy” for this task. Do NOT import any other libraries.
5.3: How to implement Padding?
Find the byte with the largest byte frequency difference, say ‘a’, and append ‘a’ to the raw_payload
(padding.py). Padding function is called repeatedly when len(raw_payload) < len(artificial_payload) (as in task1.py).
So each time you only need to pad one byte when the padding function is called.
5.4: XOR and result Clarification in Task C
Both are lists of characters, where ‘result’ keeps the replacement chars and ‘xor’ keeps the XOR of replacement and corresponding attack value.
From the sample in the write-up. Assume ‘t’ is replaced by ‘Z’ then your result will include ‘Z’ and xor table will include XOR(‘t’,’Z’)=’.’
NOTE: Be careful while XORing the chars
NOTE: You substitution_table.txt should have the format we mentioned in the writeup.
NOTE: You need to verify your Task C and see your original packet content. (check section 5.6)
5.5: Simple example for substitution
Please refer to the ‘Polymorphic Blending Attack’: Substitution part
Example
# normal traffic (x) and attack traffic (y):
x = abbcccdddd, y = rrsss
# distinct characters in normal traffic (n) and attack body (m): n = 4,
m = 2
# frequency of characters in normal traffic f(x) and attack body g(y): f(x) = [(‘d’, 0.4), (‘c’, 0.3), (‘b’, 0.2), (‘a’, 0.1)] ,
g(y) = [(‘s’, 0.6), (‘r’, 0.4)]
For the first m characters in (x), create a one-to-one mapping in both sets:
Let : t^f(y_j) = f(x_i), where t^f(y_j) refers to “t times hat{f}(y_j)” in LaTeX notation (the carrot does NOT refer to exponentiation/power).
Then:
S(s) = [(‘d’, 0.4)] S(r) = [(‘c’, 0.3)] t^f(s) = 0.4
t^f(r) = 0.3
For the (m+1)th char, first find the attack character with max ratio of g(y_j)/t^f(y_j): g(s)/(t^f(s)) = 0.6/0.4 = 1.5
g(r)/(t^f(r)) = 0.4/0.3 = 1.33
So, the next attack character is ‘s’. Then, your substitution table at this step is:
S(s)= [(‘d’, 0.4), (‘b’,0.2)]
S(r)= [(‘c’, 0.3)]
and update:
t^f(s)=0.4+0.2=0.6
Repeat to find the next attack character and so on. g(s) / (t^f(s)) = 0.6/0.6 = 1
g(r) / (t^f(r)) = 0.4/0.3 = 1.33
Now, the next attack character is ‘r’. Then, your substitution table at this step is
S(s)= [(‘d’, 0.4), (‘b’,0.2)]
S(r)= [(‘c’, 0.3), (‘a’,0.1)] and update:
t^f(r) = 0.3 + 0.1 = 0.4
After you finish the substitution, you are done with t^f(y_j)’s and you will make a substitution with the frequency weight of each character in the table.
s is substituted with:
d with a probability of 0.4/(0.4+0.2)
b with a probability of 0.2/(0.4+0.2)
r is substituted with:
c with a probability of 0.3/(0.3+0.1)
a with a probability of 0.1/(0.3+0.1)
It is up to you how you implement a weighted random assignment, but it is a trivial step.
5.6 How to verify Task C?
If you only have 64-bit compiler, you need to run the following:
$ sudo apt-get install lib32gcc-4.9-dev
$ sudo apt-get install gcc-multilib
NOTE: You can also verify Task C using lib32gcc-9-dev.
If you’re on Ubuntu Xenial, the one listed in the instructions should work: lib32gcc-4.9-dev
If you’re on Debian Buster or Ubuntu Bionic, try: lib32gcc-8-dev
If you’re on Ubuntu Cosmic or Disco, try: lib32gcc-9-dev
The 32 bit compiler is already installed in the VM.
Next, you need to generate your payload. So, somewhere near the end of task1.py add the following to create your payload.bin:
payload_file:
payload_file.write(bytearray(“”.join(adjusted_attack_body + xor_table), “utf8”))
Now, run task1.py to generate payload.bin and once it’s generated, run the makefile with make and then run a.out:
$ make
$ ./a.out
If all is well, you should see your original packet contents. If not you will get a bunch of funny letters.
NOTE: This project has only been only tested on Linux, so you may need to make a few modifications according to your system configuration.
Good luck and have fun!!

![[SOLVED] Cs6262 project 5 : machine learning for security fall, 2025](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip.jpg)

![[SOLVED] The Nutshell Term Project](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip-1200x1200.jpg)
Reviews
There are no reviews yet.