Let us return one last time to the events which occurred in 2014 at a small eCommerce start-up company in Monroe, CT.
Your investigation of the insidious greencat malware is coming to an end. You have found that the malware is using an HTTP connection to transmit command and control messages to a suspicious IP address. This discovery leads you to identify several predicates in the greencat binary which seem to control the execution of each malicious payload.
“How can I be sure which payloads are controlled by each predicate?” You wonder …
“One problem remains,” You think. “I have to precisely recreate the attack steps, as they unfolded at the eCommerce start-up. How can I know which command and control inputs will exercise those exact greencat execution paths?”
Luckily, you remember that the part-time IT support professionals employed by the eCommerce start-up had saved network packet traces for all of the accounting department computers!
“This is perfect!” You exclaim.
The plan that you devise is as follows.
Step 1: You must design a dynamic control dependence pintool.
Step 2: Using a fake greencat C&C server, you can execute greencat with the pintool, send it C&C commands, and recover the dynamic control dependence of each payload.
Step 3: You can check each payload’s network communication that you observe against the packet trace to be sure you’ve covered the correct execution paths.
Disclaimer: In this assignment, you will only accomplish Steps 1 and 2.
Instructions:
The story for this lab is quite accurate to how a malware analyst would instrument, manipulate, and compare against evidence from the original cyber-attack. A dynamic Control Dependence Graph (CDG) enables malware analysis tools to link code sections to the predicates that control their execution. In a malware investigation (as in the story above) this can reveal exactly what network or environment input triggered the attack evidence the investigator observed. A CDG is also an essential program analysis build block that is required for program slicing, and as you know from the research papers, program slicing is widely used to focus analysis algorithms on key malware capabilities. However, like in previous labs, tool designers (you) must make tradeoffs, specifically in how to obtain immediate post-dominators. In this lab, you will create a dynamic CDG using the “puppeteered” GreenCat traces you collected in Lab 5. After completing this lab, I encourage you to go a step further and write a simple analysis script to automatically identify which C&C command causes each GreenCat function to execute (e.g., use an GHIDRA plugin to automatically color each instruction based on which C&C command causes it to execute).
1. Review the concepts of dynamic control dependence and how dynamic control dependence can be computed. The slides presented in class can be downloaded from Canvas.
2. Extend the pintool you designed for Lab #5 to trace the control dependence of every instruction that is executed. To do so, your pintool will need to implement one of the dynamic control dependence algorithms discussed in class. It does not matter which algorithm you choose to implement, but I would recommend the “Regions” approach. Do this tracing entirely in memory! Your pintool should only write to an output file in the Fini function (or ThreadFini). Constantly writing every instruction to the trace file will make your pintool unbearably slow!
To determine control dependence, you will need to identify the immediate post-dominators of many instructions. For this lab we expect you to calculate IPDs based on the dynamic trace (with the control flow trace that your pintool generated for Lab #5). As we discussed in class, it is fine if your solution requires executing the dynamic analysis multiple times.
Note that: If you obtain immediate post-dominators from the dynamic execution trace, then your immediate post-dominators might cover the entire program (i.e., you will have control dependence between functions). This is acceptable for this lab!
After the process that your pintool is tracing exits (i.e., in your pintool’s Fini function), generate a DOT directed graph file representing the control dependence of all the observed instructions. Each node in your DOT directed graph file should be the address of an instruction that was executed (only ONE node per instruction address). The edges in your DOT directed graph file should go from each executed instruction to the instructions that it is control dependent on.
For example, assume the instruction at 0x638 is control dependent on the instruction at 0x634. The DOT directed graph file generated by your tool should be as follows:
digraph control_dep {
“0x638” -> “0x634”;
}
The order of the edges in the DOT directed graph file does not matter. Also see: https://stackove rflow.com/questions/1494492/graphviz-how-to-go-from-dot-to-a-graph
You can use sendsignal.exe on the Desktop (via the command prompt) to kill it.
Use your pintool and explore all the different control flow paths that each of the greencat command and control (C&C) commands exercise. Refer back to your previous labs to recover the C&C commands that greencat accepts from its C&C server. Send each command to greencat (one time is enough; order does not matter) and generate one DOT file.
Submit your pintool source code and the DOT file your pintool generated.
Note 3: Feel free to use any third-party packages for graph algorithms/processing (hint: NetworkX is a handy python library for dealing with graphs).
Additional Example:
Consider this example:
Consider what it would look like if we converted this to a digraph in the form of our lab 5 output (1A and 1B are G and H respectively):
digraph controlflow {
“0x0040100A” -> “0x0040100B”;
“0x0040100B” -> “0x0040100C”;
“0x0040100C” -> “0x0040100D”;
“0x0040100C” -> “0x0040100E”;
“0x0040100D” -> “0x0040100F”;
“0x0040100E” -> “0x0040100F”;
“0x0040100F” -> “0x0040100B”;
“0x0040100B” -> “0x0040101A”;
“0x0040101A” -> “0x0040101B”;
“0x0040100A” -> “0x0040101B”;
}
Now consider the control dependence output:
digraph controlDependence {
“0x0040101B” -> “START”
“0x0040101A” -> “0x0040100A”
“0x0040100B” -> “0x0040100A”
“0x0040100B” -> “0x0040100B”
“0x0040100F” -> “0x0040100B”
“0x0040100E” -> “0x0040100C”
“0x0040100D” -> “0x0040100C”
“0x0040100C” -> “0x0040100B”
“0x0040100A” -> “START”
}
Notice that you can add a dummy START node, which can be helpful for calculating control dependence, and in this case, we can say that node A and node H are control dependent on START, though whether or not this exists in your output is not a requirement. Also notice that the edges are reversed compared to the image, though it is describing the same node relationship. Describing the relationship as seen in the example output is what we expect.
For example, B is control dependent on A because:
– there exists a path from A to B such that every node in the path other than A and B is postdominated by B
– A is not post-dominated by B
Grade: 100 points
Grading Criteria:
The grade will be based on the correctness of the DOT file that your pintool and/or prost-processing script generates.
Your grade is dependent on the count of correct control dependence relationships you define. For example, if you miss 10% of the CDG relationships you will get a 90% on the lab. If you miss 10% of the expected CDG relationships and add an extra 10% of incorrect relationships to your CDG you will receive an 80% on the lab.
Teams:
This assignment can be done individually or in a team of 2. Please join a group in Gradescope if you are collaborating.
Do not create or join a group in Canvas. Canvas groups are different from Gradescope groups.
New to Gradescope? This link provides instructions for how to create groups in Gradescope: https://help.gradescope.com/article/m5qz2xsnjy-student-add-group-members
Zoom can also provide the ability to collaborate and video conference with your teammate.
Submission Instructions:
Upload the following to Gradescope:
The DOT file that your pintool generated, named “submission.dot”.
Your pintool code, named “plugin.cpp” and any other code files needed to run your solution. We reserve the right to run all submitted code, through automated means or otherwise, and if it is found that your code does not output equivalent to your original dotfile submission then you will also receive a zero.
Be advised, please submit (1) and (2) separately, do NOT zip them together.
Note: Gradescope will only check the formatting of your submission. Gradescope will not automatically check the correctness and provide a grade.
Note: You can download the webc2-greencat-2.7z file directly into your lab environment. After you are done with this lab, you can submit your files directly from the lab environment (Highly recommended). Doing this will help you avoid transferring the file from the lab environment to your personal computer.
Transferring Files:
To transfer files from your personal device to the lab environment or to the Windows7 VM:
Create a zip folder of all the files that you would like to transfer to the lab environment or the Windows7 VM.
Every GT student has Box and OneDrive accounts given free by the institution. Login to either of those two and upload the desired files.
Now go back to the lab environment or the Windows7 VM and login to either of those two services where you uploaded you zip folder. Download folder to the lab environment or the Windows7 VM and use the appropriate 7z command to unzip your folder.
FAQ
What is the purpose of the additional, ungraded, GradeScope assignment with the same release period as Lab 6?
This is an ungraded assignment that is exactly the same as Lab 5. It lets students submit their dynamic CFG to ensure that the coverage meets the minimum requirements. Since Lab 6 is a dynamic CDG, we test the dynamic CDG for all the points that we look for in the coverage we require for Lab 5 (these labs are largely interdependent). Making sure you start out with maximum coverage on Lab 5 before implementing Lab 6 will likely result in the best grade overall.
What to do when you encounter technical difficulties?
If you are experiencing technical difficulty such as being unable to access the lab environment, please submit a ticket to the “Digital Learning Tools and Platforms” team at https://gatech.servicenow.com/continuity. And on the ticket, please put “Route to the DLT Team” at the top of the ticket because it will help the Service Desk know where to send it.
Reviews
There are no reviews yet.