Let us return to the events which occurred in 2014 at a small start-up company in Monroe, CT.
Suddenly, your phone rings. Your caller ID shows that it is the CEO of the eCommerce start-up!
“Hello?” you ask.
“The investors are getting nervous! The lawyers are asking questions! The customers aren’t buying our eCommerce product!” he yells.
“I’m working as fast as …” you say, but he interrupts.
“We need to provide some proof that no customer data was stolen! I need you to get me that proof by next week, or you’re not getting paid!” he says before hanging up.
Your mind races… what can you do? How can you provide proof?
Instructions:
An accurate Data Dependence Graph (DDG) is the most sought-after building block in the program analysis universe. Malware analysis tools require a DDG to answer any questions about the malware’s operation. You’ve probably seen multiple applications of DDGs in the research papers up to this point. Unfortunately, static analysis hurtles such as path explosion and aliasing force tool developers to make difficult implementation tradeoffs which limit the accuracy of their DDGs. In this lab, you will combat path explosion and aliasing with the goal of building a best-effort DDG — another essential building block for malware analysis. After completing this lab, I encourage you to go a step further and write a simple analysis script to automatically extract any DDG paths within GreenCat that can exfiltrate data from files on the victim system.
1. Review the concepts of data dependence and how data dependence can be computed. The slides presented in class can be downloaded from Canvas.
2. Extend the plugin you designed for Lab #3 to do the following:
Loop every instruction in every basic block in every function in your greencat-2 disassembly (from before).
Compute the data dependence of each instruction. You can design any methods or data structures you wish to accomplish this. You can use any GHIDRA SDK APIs that will help you (but none exist that can compute data dependence for you).
Generate a DOT directed graph representing the data dependence of all the instructions in each function. Specifically, one DOT graph per function — name each DOT graph (called a “digraph” in the DOT file format) the address of the function.
Each node in your DOT directed graph should be the address of an instruction (only ONE node per instruction address). Node labels can be just the instruction addresses. The edges in your DOT directed graph file should go from each instruction to any instructions which that instruction is data dependent on. The order of the edges in the DOT directed graph file does not matter.
Consider this example from the greencat-2 binary. Here are the instructions in the function starting at address 0x401000 in greencat-2:
401000: push esi
401001: mov esi, ecx
401003: call 0x401078
401008: test BYTE PTR [esp+0x8], 0x1
40100d: je 0x401016
40100f: push esi
401010: call 0x402a5c
401015: pop ecx
401016: mov eax, esi
401018: pop esi
401019: ret 0x4
The DOT directed graph generated by your tool for this function should be as follows:
Note: The following example is for full credit, which includes tracking the calling conventions and arguments of CALL instructions.
digraph 0x401000 { n0 [label = “START”]; n1 [label = “0x401000; DD: START”]; n2 [label = “0x401001; DD: START”]; n3 [label = “0x401003; DD: START, 0x401000”]; n4 [label = “0x401008; DD: START, 0x401003”]; n5 [label = “0x40100d; DD: 0x401008”]; n6 [label = “0x40100f; DD: 0x401001, 0x401003”]; n7 [label = “0x401010; DD: 0x40100f”]; n8 [label = “0x401015; DD: 0x40100f, 0x401010”]; n9 [label = “0x401016; DD: 0x401001”]; n10 [label = “0x401018; DD: 0x401000, 0x401003, 0x401015”]; n11 [label = “0x401019; DD: 0x401018”];
n1 -> n0; n2 -> n0; n3 -> n0; n3 -> n1; n4 -> n0; n4 -> n3; n5 -> n4; n6 -> n2; n6 -> n3; n7 -> n6; n8 -> n6; n8 -> n7; n9 -> n2; n10 -> n1; n10 -> n3; n10 -> n8; n11 -> n10;
}
You tool should process every function in the greencat-2 binary. All DOT graphs for all the functions should be output in a single “.dot” file. So, after you GHIDRA plugin finishes executing, you should have a single “.dot” file with many digraphs in it (one digraph per function).
The order of the edges in the DOT directed graph file does not matter. Also see: https://stackove rflow.com/questions/1494492/graphviz-how-to-go-from-dot-to-a-graph
Lab Requirements / FAQ (MUST READ):
This section contains some frequently asked questions and requirements that students should adhere to when working on this assignment.
How do CALL instructions work for this assignment? How are they calculated?
CALL instructions for this assignment are similarly calculated to Lab 3. To get full credit you must properly be tracking Calling conventions and stack dependency.
Do we need to calculate dependencies between functions?
No. Similar to Lab 3 (and for all scripting labs) functions will be considered independently, meaning you do not need to link dependencies between functions. This is the purpose of the START keyword. The START keyword should be used to express that a dependency originated outside of the local function.
Grade: 100points
Grading Criteria:
The grade will be based on how many instructions and functions your plugin handles correctly (i.e., the edges and the labels in the DOT graph are correct).
Here is what the team will look for while grading:
Register dependencies: Register reads/writes are the easiest case of dependencies.
Direct push & pop dependencies: This requires that your plugin track changes of the stack pointer inside each function. Hint: Since we do not know its true value, pretend like ESP = 0 at the start of each function, and then track its changes for each instruction. Note that function args will be above ESP at the start of the function.
Static memory positions: These are memory locations that GHIDRA gives a name to and accesses via that name (e.g., “mov [ebp+var_4], eax” or “mov dword_429C48, eax”). This requires your plugin to note each instruction which writes to that memory position.
Everything else: There are very few complex memory read/write dependencies (e.g., those which include aliasing) in the functions we will grade. I did not find any cases of aliasing in my cursory pass over the code. If you are concerned about any cases of complex memory read/write dependencies, then please post on Ed Discussionand we will be glad to check it out.
The grade will be based on how many instructions and functions your plugin processes correctly, and is ultimately based on your graph submission (DOT file).
Data Dependence accuracy of top 10 instruction mnemonics are worth 5% of the total grade each (mov, add, sub, cmp, test, xor, push, pop, lea, all forms of jump). For example, if 20% of your mov instructions are wrong (missing a dependent or has an erroneous dependent) then you will lose 1% of the total 100 points.
Data Dependence accuracy of all other instruction types are collectively worth 15% of the total grade. For example, if 30% of the other instructions are wrong (missing a dependent or has an erroneous dependent) then you will lose 5% of the total 100 points.
Edge accuracy is worth 30% of the total grade. For example, if 10% of your edges are wrong (missing or have an erroneous extra edge) then you will lose 3% of the total 100 points.
We will only grade the functions that you commented in Lab 2. The maximum deduction is 100. There will be no negative grades.
Note: Grades in sections are rounded down to the nearest percent.
Call Tracking:
Up to 20 additional points will be awarded for properly tracking the DD of CALL arguments Note: This will require using GHIDRA ’s APIs to determine the number of function arguments. For example:
40156B push 3Ch ; …
40156D xor ebx, ebx ; …
40156F lea eax, [ebp+buf] ; …
401572 push ebx ; …
401573 push eax ; DD: 401572, 40156F
401574 call _memset ; DD: 401573, 401572, 40156B
Teams:
This assignment can be done individually or in a team of 2. Please join a group in Gradescope if you are collaborating.
Do not create or join a group in Canvas. Canvas groups are different from Gradescope groups.
New to Gradescope? This link provides instructions for how to create groups in Gradescope: https://help.gradescope.com/article/m5qz2xsnjy-student-add-group-members
Zoom can also provide the ability to collaborate and video conference with your teammate.
Submission Instructions:
Upload the following to the Lab 3 Assignment in Gradescope:
The DOT file output by your GHIDRA plugin, named “submission.dot” which contains digraphs for every function in the greencat-2 binary.
Your GHIDRA plugin code, named either “plugin.py” or “plugin.java” depending on the chosen language. We reserve the right to run all submitted code, through automated means or otherwise, and if it is found that your code does not output equivalent to your original dotfile submission then you will also receive a zero.
Be advised, please submit (1) and (2) separately, do NOT zip them together.
Note: Gradescope will only check the formatting of your submission. Gradescope will not automatically check the correctness and provide a grade.
Note: You can download the webc2-greencat-2.7z file directly into your lab environment. After you are done with this lab, you can submit your files directly from the lab environment (Highly recommended). Doing this will help you avoid transferring the file from the lab environment to your personal computer.
Transferring Files:
To transfer files from your personal device to the lab environment:
Create a zip folder of all the files that you would like to transfer to the lab environment.
Every GT student has Box and OneDrive accounts given free by the institution. Login to either of those two and upload the desired files.
Now go back to the lab environment and login to either of those two services where you uploaded you zip folder. Download folder to the the lab workspace and use the appropriate 7z command to unzip your folder.
What to do when you encounter technical difficulties?
If you are experiencing technical difficulty such as being unable to access the lab environment, please submit a ticket to the “Digital Learning Tools and Platforms” team at https://gatech.servicenow.com/continuity. And on the ticket, please put “Route to the DLT Team” at the top of the ticket because it will help the Service Desk know where to send it.
Grades have been released. How do I view my raw feedback?
import base64 import gzip
base85_encoded_data = b“Paste the encoded data here” base85_decoded_data = base64.b85decode(base85_encoded_data) gzip_decompressed_data = gzip.decompress(base85_decoded_data) with open(“output.txt”, “wb”) as f:
f.write(gzip_decompressed_data)
CS6747, Data, Dependence, GHIDRA, Plugin, solved
[SOLVED] Cs6747 lab #4 – data dependence ghidra plugin
$25
File Name: Cs6747_lab__4_____data_dependence_ghidra_plugin.zip
File Size: 442.74 KB
Only logged in customers who have purchased this product may leave a review.
Reviews
There are no reviews yet.