Sections:Scenario:You got a malware sample from the wild! Your task is to discover what the malware does by analyzing it.How do you discover the malware’s behaviors? There are multiple ways of analyzing it but we’ll be focusing on two ways: Static Analysis and Dynamic Analysis.Static Analysis:Dynamic Analysis:In our scenario, you are going to analyze the given malware with tools that we provide. These tools help you to analyze the malware with static and dynamic analysis.Objective:Requirement:Project Structure:○ https://www.virtualbox.org/wiki/Downloads ● Download the Virtual Machine (VM)○ https://www.dropbox.com/s/dnk6acztw9ewp83/Project%203.zip?dl=0○ Unarchive the file with 7zip and password is cs6262 ● Network Configurations:○ tap0:■ Virtual network interface for Windows XP • IP Address: 192.168.133.101○ br0■ A network bridge between Windows XP and Ubuntu■ A network that faces the Internet○ Go to File → Import Appliance○ Select the ova file and import it○ For detailed information on how to import the VM, see:○ Before starting, it might be useful to configure the settings, allocate more base memory, processors etc. to your VM, as per your device configurations for better performance. ● VM user credentials○ Username: analysis ○ Password: analysisNOTE: VM Setup■ init.py○ Type your Georgia Tech username (your Canvas LoginName) after running this •$./init.py■ update.sh○ Please run this script when you start the project! (If it says that you’re already updated when you run it, that’s fine)○ If you have already completed stage 1 before running update.sh, you do NOT need to redo stage 1 – but you will need to run update.sh to complete stage 2 ■ archive.sh■ vm■ shared■ report■ Tools○ Configure your network firewall rules (iptables) by editing iptables-rules.○ You can allow/disallow/redirect the traffic from the malware○ ‘./reset’ command in this directory will apply the changes○ An analysis tool that helps you to find interesting functions of malicious activity○ You need to edit score.h to generate the control-flow graph ○ Use xdot to open the generated CFG.○ A symbolic executor (based on angr : https://github.com/angr)■ Helps you to figure out the commands that malware expects○ Use cfg-generation tool to figure out the address of the function of interests○ A simplified tool for C2 server reconstruction○ You can write down command in the *.txt file as a line○ It will randomly choose command at a time to send to the malware ○ Malware:■ stage1.exe – stage 1 malware■ stage2.exe – stage 2 malware■ payload.exe – the linux malware attack payloadFind the loop entry point and function sequence in the loopTutorials:○ Update the project 3 before begin■ Open the terminal (Ctrl-Alt-T, or choose terminal from the menu)■ Run ./update.sh○ Initializing the project■ Open the terminal (Ctrl-Alt-T, or choose terminal from the menu)■ Run ./init.py○ Note:■ These are malware samples hosted under the Goergia Tech Network○ That is all the malware samples you will be downloading during this project■ IMPORTANT$ file <path-to-exe>unzip <path-to-exe>○ Password: infected○ We need a secure experiment environment to execute the malware○ Why?■ Insecure analysis environment could damage your system ■ You may not want:■ Contain malware in a virtual environment○ Conservative rules(allow network traffic only if it is secure) ○ We provide a Win XP VM as a testbed! ● Run Win XP VM○ Run Windows XP Virtual Machine with virt-manager○ Open a terminal○ Type “virt-manager” and double click “winxpsp3”○ Click the icon with the two monitors and click on “basecamp”○ Right click on basecamp, and click “Start snapshot.” Click Yes if prompted.○ Once, virt-manager successfully calls the snapshot, click Show the graphical console.■ Click on the Windows Start Menu and Turn off Computer.■ Then select Restart○ DO NOT MODIFY OR DELETE THE GIVEN SNAPSHOTS!■ The given snapshots are your backups for your analysis.■ If something bad happens on your testbed, always revert back to the basecamp snapshot. ● Copy from Shared Directory○ Go to the shared directory by clicking its icon (in Windows XP)■ Copy stage1.exe into Desktop■ If you execute it in the shared directory, the error message will pop up. Please copy the file to Desktop.○ Now we will run the malware■ Execute stage1.exe (double click the icon)■ It will say “Executing Stage 1 Malware”. Then, click OK.○ Otherwise, malware execution will be blocked○ If you want to halt the malware that is running… ■ Execute stop_malware in the temp directory.○ To analyze network behaviors, you need■ Wireshark (https://www.wireshark.org/)■ Capturing & Recording inbound/outbound network packets○ By capturing and recording network packets through the tools■ Reveal C&C protocol■ Attack Source & Destination ○ But, malware will not do anything. Why?■ The C2 server is dead!■ Therefore, the malware (C2 client) will never unfold its behaviors.■ Question?○ Let’s check it through network monitoring ■ Everything has been already installed.■ Open Wireshark, capture the traffic for the network bridge(Make sure to run with root privileges)■ IP address = 192.168.133.1■ Reference: https://www.wireshark.org/docs/■ Get yourself familiarized with Linux commands and how to employ Wireshark.■ Other references:○ From WireShark, we can notice that the malware tries to connect to the host at 128.61.240.66, but it fails○ Let’s make it redirect to our fake C2 server■ Go to ~/tools/network■ Edit iptables_rules to redirect the traffic to 128.61.240.66 to192.168.133.1 (fake host)○ Whenever you edit iptables_rules, always run reset.■ (type “./reset” from the ~/tools/network directory)○ IMPORTANT! If you shut down your project VM, be sure to run reset again the next time you start it up.○ Observing C2 traffic■ In WireShark, we can notice that now the malware can communicate with our fake C2 server■ You can see the contents of the traffic by right-clicking on the line, then clicking Follow – TCP Stream○ Let’s take a look at cuckoo. Cuckoo is NOT necessarily required to complete this project, but it is a useful tool to help you understand what your malware is doing, and therefore how you might want to modify your score.h file later in the project.○ Note! You can’t run the testbed VM and cuckoo simultaneously.○ Always turn off the testbed VM, and follow the steps below to execute Cuckoo○ Open two terminals.○ ‘$workon cuckoo’ (Set virtualenv as cuckoo for both terminal1 and terminal2)○ Open one terminal in debug mode, with command: ‘$cuckoo -d’○ Open other cuckoo terminal for the webserver, with command: ‘$cuckoo web’○ Reference: Malware Analysis using Cuckoo Sandbox○ If you get an error when running cuckoo web because port 8000 is already inuse, run “sudo fuser -k 8000/tcp” and try again.○ The Cuckoo uses a snapshot of the given testbed VM. ○ The snapshot is 1501466914○ • DO NOT TOUCH the snapshot!○ To open the cuckoo web server, type the following URL into Chromium■ http://localhost:8000○ To upload a file, click the red box and choose a file.○ Once you click the Analyze button, it will take some time to run the malware.○ Once you click the Analyze button, it will take some time to run the malware.○ The malware does not exhibit its behavior because we did not send the correct command through our fake C2 server ○ We will use■ File/Registry/Process tracing analysis to guess the malware behavior.■ control-flow graph (CFG) analysis and symbolic execution to figure out the list of the correct commands○ The purpose of tracing analysis is to draw a big picture of the malware ■ What kinds of System call/API does the malware use?■ Does the malware create/read/write a file? How about a registry?○ The purpose of CFG analysis is to find the exact logic that involves the interpretation of the command and the execution of malicious behavior○ Then, symbolic execution finds the command that drives the malware into that execution path○ On the side bar, there are useful menus for tracing analysis. ■ We are focusing on:○ Trace behaviors in time sequence. ● Static Analysis on Cuckoo○ Static Analysis■ Information about the malware. ■ Win32 PE format information○ .text ○ Strings, etc.○ .data○ .idata○ .reloc○ More information: Malware researcher’s handbook (demystifying PE file) ○ Interestingly three DLL(Dynamic Link Libaries) files are imported. ○ In WININET.dll, we can see that the malware uses http protocol.○ In ADVAPI32.dll, we can check if the malware touches registry files ○ In Kernel32.dll, we can check the malware waiting signal, also sleep.○ Tracing a behavior(file/process/thread/registry/network) in time sequence.○ Useful to figure out cause-and-effect in process/file/network.○ Malware creates a new file and runs the process, then writes it to memory.○ Based on our analysis with Cuckoo, we can determine if… ■ The malware uses HTTP protocol to communicate ● Communicate with whom? C&C?■ The malware touches(create/write/read) a file/registry/process○ Based on the pre-information that we collected from the previous step, we aregoing to perform CFG analysis & symbolic execution analysis○ CFG:■ graph representation of computation and control flow in the program■ Nodes are basic blocks■ Edges represent possible flow of control from the end of one block to the beginning of the other.○ But, in malware analysis, we are analyzing CFG at the instruction level.○ We provide a tool for you that helps to find command interpretation logic and malicious logic■ We list the functions or system calls the malware uses internally■ If you provide the score (how malicious it is, or how likely the malicious logic is to use such a function) for the functions, then the tool will find where the malicious logic is, based on its scorehigher score implies that more functions related to the malicious activity are used within the malware.■ Your jobis to write the score value per each function○ More info:http://www.cs.cornell.edu/courses/cs412/2008sp/lectures/lec24.pdf○ From our network analysis, we know that the malware uses an Internet connection to 128.61.240.66○ From our cuckoo-based analysis, we know that the malware uses the HTTP protocol.○ Moreover, it uses some particular functions to communicate and stay in touch with the command and control server.○ Modify the score values for these particular functions in order to generate a better CFG – for proper analysis.○ Find the file to be edited – score.h.○ Path: /tools/cfg-generation/score.h ○ Build control flow graph■ By executing ./generate.py stage1, the tool gives you the CFG ● This finds the function with higher score○ Implies that this calls high score functions on its execution ■ For stage2○ Note: your graph and its memory addresses will vary from this example ○ The function entry is at the address of 405190■ And, there is a function (marked as sub) of score 12■ This implies that○ Run from 405190 to 40525a○ Finding Commands with Symbolic Execution■ We want to find a command that drives malware from 405190 to 40525a■ Rather than executing the program with some input, symbolic execution treats the input data as a symbolic variable, then tries to calculate expressions for the input along the execution.■ Path explosion■ Modeling statements and environments■ Constraint solving○ Symbolic Execution Engine: Klee, Angr, Mayhem, etc. • Loading a binary into the analysis program○ • Translating a binary into an intermediate representation (IR). • Translating that IR into a semantic representation○ • Performing the actual analysis with symbolic execution.○ In this example, ONLY i=2, j=9 conditions will lead the program to print “Correct!”○ Symbolic execution is available to solve the expression in order to reach a target, in this case ”Correct”.○ Let’s apply it into Malware Command & Control logic. A C&C bot(malware) is expecting inputs(solve the expressions) to trigger behaviors(targets).○ In this example, ONLY ‘launch-attack’ and ‘remove’ commands(inputs) triggers attack() and destroy_itself().○ Symbolic execution is able to find ”launch-attack” as an input to trigger attack(), which is a malicious behavior.○ Plus, ”remove” will lead to destroy_itself(), which is another behavior.○ Our job in this project with Symbolic execution is to find inputs, and then feed the inputs to trigger behaviors.○ We prepared a symbolic executor and a solver for you■ Your job is to find the starting point of the function which interprets the command, and find the end point where malware actually executes some function that does malicious operations■ The symbolic executor is called angr (http://angr.io/index.html) ○ We prepared a symbolic executor and a solver for you.○ How do you run it?■ Go to ~/tools/sym-exec■ Run it likepython ./sym_exec.py [program_path] [start_address] [end_address]○ Replace the (above) start and end addresses from your CFG graph.○ The command will be printed at the end (if found)○ After CFG analysis + symbolic execution, reconstruct the C2 server○ The tool for reconstructing the C2 server is already on the VM○ It runs nginx and php script■ This will look like ~/tools/c2-command/stage*-command.txt■ Your job is to add your commands to the relevant *.txt file“$insert” (note: the name of the command you see may vary) ● Then, type ”$insert” and save the file.○ Note: This means that if you want to run only a particular command, you’ll need to remove, or comment out the other commands in your file○ SimState■ angr – SimState■ While angr perform symbolic execution, it stores the current state of the program in the SimState objects.■ SimState is a structure that contains the program’s memory, register and other information.■ SimState provides interaction with memory and registers. For example, state.regs offers read, write accesses with the name of each registers such as state.regs.eip, state.regs.rbx, state.regs.ebx, state.regs.ebh■ Creating an empty 64 bit SimState○ Bitvectors■ Since, we are dealing with binary files, we don’t deal with regular integers.■ In binary program, everything becomes bits and sequence of bits.■ A bitvector is a sequence of bits used to perform integer arithmetic for symbolic execution.■ Creating some 32 bit bitvector values■ state.solver.BVV(4,32) will create 32 bit length bitvector with value 4■ We can perform arithmetic operations or comparisons using the bitvectors○ Symbolic Bitvectors■ state.solver.BVS(’x’, 32) will create a symbolic variable named x with 32 bit length■ Angr allows us to perform arithmetic operation or comparisons using them.○ Registers■ State provides access the registers through state.regs.register_name where register_name could be rcx, ecx, cx, ch and cl. Same applies to the other registers.■ Look at the types of registers — they are bit vectors ■ Look at the length of registers examined below.■ For cl, ch, cx and ecx they are all part of rcx.■ You can compare the length and the location of cl, ch, cx, ecx and rcx in angr with the actual architecture depicted below.○ Constraints■ In a CFG, a line like if ( x > 10 ) creates a branch. Please look at the Symbolic Execution Concepts tutorial.■ Assuming x is a symbolic variable, this will create a <Bool x_5_32 > 4> when the True branch is taken for the successor state■ For the false branch,negation of a <Boolx_5_32>4> will be created. ■ Adding a constraint to a SimState○ Radare2■ Launch radare2 with $ r2 ~/shared/payload.exe■ Then type aaa which will analyze all (functions + bbs)■ afl list all functions■ afl lists all the functions which are hard to analyze.■ afl~name grep the list of functions with given name■ afl~attack will list all the functions having attack■ You can use linux commands while inside the r2 console such as grep.■ On the right side, you can see all the functions having the attack vector (afl~send)■ Using those api calls, this linux malware performs DDoS attacks based on the commands they receive from C&C server.■ The example below shows how to find all the attack vectors calling sym.send/sym.sendto■ Now, we have to iterate all the attack functions on the right. For example, the example below shows three attack functions, and only one of them is called. Our focus is the call sym.attack_????? functions.■ Let’s analyze the example below.■ axt sym.attack_app_http has only one reference which is a push instruction. This is not the attack function we are interested in.■ axt sym_attack_app_cfnull has no reference at all. This is not the attack function we need to explore.■ axt sym_attack_???? Is one of the functions listed on the right example, and have call sym.attack_????? Instruction. That is the function we need to explore more to determine the target address for the symbolic execution.■ You need to find 2 attack functions.■ After finding the attack function, we can determine the target address.■ The address of the instruction which is the successor of call sym.send(to) is the target address for the symbolic execution.■ For more information :○ You don’t have to use Radare2.○ Here some of the tools you may want to use■ objdump■ IDA-Pro (Dissambly tool with GUI) (Free version)■ Cutter (GUI for the radare2)○ Check its network access with Wireshark○ Redirect network traffic to if required (if the connection fails)○ Try to identify malicious functions by editing score.h and using the cfg-generation tool○ Discover the list of commands using the symbolic execution tool ○ Fill the commands in ~/tools/c2-command/stage2-command.txt ○ Run it as mentioned before.○ This is Linux Malware.○ for linux malware symbolic execution○ python linux_sym_exec.py path_to_linux_mw start target○ To make it work, you need to modify two linux_sym_exec.py functions■ targs_len_before and opts_len_before ● ~/tools/dynamicanalysis/○ instrace.linux.log : the dynamic instruction trace for the linux malware○ detect_loop.py : you have to modify this file to find the loop in the given trace○ Usage: python detect_loop.py○ Search for C&C commands and trigger conditions○ Vet the app for any anti-analysis techniques that need to be removed.○ Background services○ You have received a malware sample sms.apk. ○ You need to identify communication with the C&C server ○ Identify anti-analysis techniques being used by the app.○ Identify commands that trigger any malicious behavior.○ An emulator for Android 4.4 is pre-installed ■ Run ‘run-emulator’○ Jadx■ Disassembles apk files into Java source code.○ Rebuilds apk files.○ ~/Android/MaliciousMessenger/tutorialApps ■ Emu-check.apk■ Another tutorial example○ Target app to analyze to answer the questionnaire○ On the questionnaire sheet, there are entries for writing domain names. Please follow the following rules on getting answers for those questions.○ You should write FQDN, which means, if the full domain name is canof.gtisc.gatech.edu then write canof.gtisc.gatech.edu, not just gatech.edu or gtisc.gatech.edu○ For the others (connections check, DDoS, sending info, etc.), you should get the exact domain name that the malware uses. For example, the IP address 130.207.188.35 belongs to both coe.gatech.edu and web-plesk5.gatech.edu.○ Because there are multiple mappings, you cannot be sure about which domain that the malware used by just using nslookup. In this case, please go through the other way of getting domain names from DNS Packets in Wireshark. ○ All Domains should be based on Wireshark DNS packets■ e.g., get it from a DNS query packet or redirect HTTP traffic into a local VM and examine the Host header.○ If you get see the log in the Wireshark, You will find DNS query(Standard query) and DNS response(Standard query response)○ In Domain Name System section, there is Query section, like below ○ Queries:■ x.y.z: type A, class IN.○ Answers:■ x.y.z: type CNAME, class IN, cname a.b.c○ You should use x.y.z○ For all URLs, you do not have to specify the protocol (http:// or https://, etc.).○ However, if HTTP traffic is like the following:■ POST /a/b/c/d?asdf=1234 HTTP/1.1 Host: www.zzz.com ○ Then please write this as■ www.zzz.com/a/b/c/d?asdf=1234○ There are pre-installed PHP scripts in the VM locally that read the *.txt file for each stage,■ These scripts send the command to the malware after reading them from the TXT files.■ One caveat of these scripts is that they are written to send the commands in random order (i.e., if there are commands a, b, c, then the script will randomly choose one command and send it to the malware).■ So if you want to test ONE command at a time, then please write only that command in the TXT file.○ You could use free IDA-Pro, objdump or radare2 for this task to find out called attack functions, and the target addresses.○ Look for some angr examples on the github, which adds constraints to the state.○ For the loop detection, focus on function sequence that called repetitive ● Correct command but malware is not working?○ Note that some commands for stage 2 are different per each student, by having 4 digit hexadecimal numbers at the end of the command.■ Ex. a command for stage 2 is formatted like $COMMANDa1b4 ■ (NOTE: three commands in stage 2 have the 4 digit hexadecimal tail.■ All commands in stage 3 have the 4 digit hexadecimal tail on the command.○ However, there could be a case that only gets the front part of the command like■ $COMMAND■ If the endpoint address of symbolic execution is not correctly set. In such a case, please set the correct end point that you can get the entire command.○ In the VM, we provide cuckoo, which is a dynamic malware analysis framework.■ It is very convenient and easy to use.■ While you are running cuckoo, you might meet some warnings and errors “critical time blah blah~” and “YARA signature…. blah blah”. Please ignore them.■ Because you are executing malware in the QEMU Windows VM, the framework needs to set a time.■ In our case, the malware is never going to unfold even though you give an infinite time to be executing the malware unless you feed the right inputs(The malware expects C2 commands.) ○ IPtable Setting■ If you check /home/analysis/.cuckoo/conf/kvm.conf, you will find how we set the QEMU windows host VM.■ You will find the IP of the host VM is “192.168.133.101”.■ If you want to see network behaviors in Cuckoo, you want to forward theIP in /home/analysis/tools/network/iptables- rules.■ For example, open iptables-rules, you want to addsudo iptables -t nat -A PREROUTING -p tcp -s 192.168.133.101 -d[DEST-IP] –dport 80 -j DNAT –to 192.168.133.1:80○ Run the Windows VM only when:■ Sending commands to malware■ Analyzing network traffic via Wireshark■ Once done with those tasks, turn off the Windows VM.○ Avoid running the windows VM when:■ Running cuckoo analysis■ Generating CFGs■ Running Symbolic Execution – This is quite resource intensive, avoid doing other stuff to get this done quickly. (TIP: If this seems to be taking infinite memory/time, you’re mostly trying to reach an unreachable / invalid address! check your addresses!)○ Try running the VM at a lower resolution (recommend at-least 1280×800, for legibility) – If you have a very high resolution on your host machine. You can do this in 2 ways:■ VirtualBox Menu – View > Virtual Screen 1 > Resize to a x b■ Ubuntu Menu – Type “Displays” > Change it there○ Restart after a task / stage. This is mostly a last resort but restarting the VM after finishing a task/stage made everything feel really smooth, instead of trying to free memory etc. Just be sure to run ./reset in ~/tools/networks after each VM restart!○ Fewer resource allocation could result in some issues, you could try to reinstall the VM image (deleting the previously stored state), and even Virtual-box as a last resort.■ ~/report/assignment-questionnaire.txt■ Stage1.exe, stage2.exe, payload.exe (linux malware) ▪~/tools/network/iptables_rules■ ~/tools/cfg-generation/score.h○ Read ~/report/assignment-questionnaire.txt○ Carefully read the questions, and answer them in~/report/assignment-questionnaire.txt○ For each stage, there are 4-6 questions regarding the behavior of the malware. ● Android Part○ READ ~/Android/MaliciousMessenger/writeup.pdf○ Carefully read the writeup, answer in ~/report/assignment-questionnaire.txt ○ Make sure you overwrite ANSWER_HERE○ As each section is worth an equal amount of your overall P2 grade, we normalized the Windows score by dividing by 1.1 (and rounded up), then averaged it with the Android score to get your final grade. So effectively, each point in the table above is worth half a point of your final project grade (slightly less for Windows).Android Malware Analysis LabJune 11, 2017[1] Every application must have an AndroidManifest.xml file in its root directory. The manifest file provides essential information about your app to the Android system, which the system must have before it can run any of the app’s code. Among other things, the manifest file does the followingIn Listing 1 an example of an app’s manifest file is shown. From it, we can see that this app declares that it needs the INTERNET and RECEIVE SMS permissions. Additionally, the app uses three components: ActivityOne, SmsReceiver, and myAppsService. ActivityOne is declared in lines 80-85. The intent-filter tag specifies the types of intents that an activity, service, or broadcast receive can respond to. An intent filter declares the capabilities of its parent component – what an activity or service can do and what types of broadcasts a receiver can handle. It opens the component to receiving the intents of the advertised type, while filtering out those that are not meaningful for the component. Lines 16-21 declare a broadcast receiver component named SmsReceiver.From the intent filters, we see that the Android OS will notify SmsReceiver when the device receives a new text message. The final component this app uses is a service component named ServiceOfApp declared on lines 23-25.The Android Manifest file provides a high-level abstraction of an app’s behavior. When attempting to manually inspect the internal behaviors of an application statically, the manifest file is a good starting point. It provides key insights on the permissions an application is using, the components it is using, and how the application interacts with the Android OS and the outside world. Additional information about the contents and attributes of the manifest file can be found in the Android documentation [1].12345101112131417181920212425262728293031Listing 1: An example of an app’s Android Manifest FileAndroid uses the Android application package (APK) format to distribute apps to Android devices. Apks are nothing more than a zip file containing resources and assembled Java code. However, if you were to simply unzip the apk you would only have two files: classes.dex and resources.arsc. Since viewing or editing compiled files is next to impossible, the apk file needs to be decoded or disassembled. If one wishes to analyze an app at the bytecode level, reverse engineering tools, such as Apktool [2] are available. Additionally, the app’s Java source code can be partially reconstructed using JADX [3]. You will probably find both tools useful for completing this lab.Apktool is a reverse engineering tool for Android apps. It can decode resources to nearly original form and rebuild them after making some modifications. It also makes working with an app easier because of the project like file structure and automation of some repetitive tasks like building apk, etc. [2]. The functionality of Apktool is well-documented and we will briefly describes how this tool can be used to decode and build apk files. More information about Apktool can be found in its documentation [2].In this example, we will use Apktool to decompile a malicious apk that was found in the wild (a7f94d45c7e1de8033db7f064189f89e82ac12c1) [4]. The apk is a repackaged version of the CoinPirates game that includes a malicious payload.Apktool provides a command line interface. Its most common use case is for decoding and disassembling apk files. If you need to decode an apk file, you use the d (decode) option and pass the apk file as an argument. An example is shown in Listing 2 on line 1.123456789 101112131415Listing 2: Decoding an apk using Apktool.If you look in the directory created you should see something similar to Listing 3. For this lab, we will focus mostly on the AndroidManifest.xml file, the res/ directory, and the smali/ directory. The app’s resources, such as its images and layouts can be found in the res/ directory. In the smali/ directory, the original classes found in the classes.dex file can be found. Apktool converts the original classes.dex file into smali using baksmali[5], an assembler/disassembler for the dex format. We will discuss the contents of these files and smali syntax later on.12Listing 3: Contents of the directory created.Apktool also can rebuild an apk file from the decoded resources after making some modifications, such as modifying the smali code. To build an app you need to provide the b (build) parameter to Apktool and also provide the decoded directory as an argument like the example in Listing 4.123456Listing 4: Rebuilding an apk file using Apktool.If you received no errors, the new apk should be found in the dist subdirectory of the directory provided as input. For example the apk created from running the command in Listing 4 is shown in Listing 5. In your working directory, you will still have a copy of the original apk file. It does not include any modifications you may have made.123Listing 5: The location of the modified apk.The next step is to sign the apk you just created. If the apk has not been signed it will fail to install on an emulator or real device. The Android SDK provides a utility program called apksigner that is located in the Android/Sdk/build-tools/SDK version/ directory. We have provided this program on your VM (You can also use jarsigner if you prefer). For this lab, you should just sign the apk with the debug key, which is located in the debug.keystore file located in your $HOME/.android/ directory. An example of signing an apk is shown in Listing 6. You need to provide the location of the keystore after the –ks option and pass the apk file as an argument. You will be prompted for a password. The default password is android.123Listing 6: Signing your apk file (password is android).After you have signed your apk, install it onto the emulator to verify everything went correctly.Apktool can also be useful for making small modifications to the underlying byte code. For example, let’s assume a malicious app is using the anti-analysis check shown in Listing 7 to prevent the execution of any malicious behavior if the Build type is eng. Use apktool to disassemble this app, so that you can modify the code located in the smali directory. Use apktool to disassemble the app located in tutorialApps/emu-check.apk. After you have done so, open the file emu-check/smali/com/myapplication/MainActivity.smali in a text editor. You will see the code shown in Listing 8. The code shown is smali and is a representation of Dalvik bytecode. The Android Developer’s website provides a page that discusses the types of instructions and arguments [6].For the checkEnvironment method, the app is checking the model’s build type to see if it is equal to the string “eng”. In the bytecode, we see that the value of Build.TYPE is stored in register v0 on line 7. The string constant “eng” is stored in register v1 on line 9. The comparison of the strings is completed on line 11 and the result is stored in register v0. On line 13 we see that if the value stored in register v0 is equal to zero, then a jump to the cond 0 branch will occur. Therefore, if the Build.TYPE is not ”eng” then a jump to cond 0 occurs and the malicious behavior will be triggered. Since we are on an emulator, our Build.TYPE will be “eng” and the jump will not occur. To force the controlflow to go to cond 0, change the statement on line 15 to “goto :cond 0”. This will force the branch to occur every time the app runs. Build and sign the app. Install it onto the emulator (If you installed the previous version you will need to uninstall it first) and open the app. If you check logcat, you will see that the Build type is ”eng”. However, the app will now log the ”do something malicious” instead.12345Listing 7: Prevents malicious behavior if the build type is eng. 45678910111213141516171819202122232425262728293031323334353637Listing 8: checkEnvironment in smali.JADX [3] is another tool that can be used to disassemble apk files. However, JADX disassembles the Dalvik byte code into JAVA source code. The translation is imperfect and will most likely be incomplete, but it is still useful for doing analysis. JADX provides two interfaces: a command line interface and a gui interface. For this lab, we will only discuss the gui interface. You can start the GUI interface of JADX by running jadx-gui from the command line. When the program first opens, it will ask the user to choose a file to disassemble. It supports apk, dex, jar, class, zip, and aar files. This discussion will only discuss using apk files. After you choose the apk file, JADX will begin disassembling the apk. When it’s complete you should see the source code for each class in the Menu pane. If you review the source code, you can see it is not ideal, but it does provide insight into the app’s behavior.Now that we have disassembled the apk file, we can begin analyzing the source code to identify suspicious behavior. Defining behavior within Android is challenging. Behavior that may be suspicious or malicious in one application may be expected behavior in another application. It is reasonable for a messaging app to access a user’s contacts, but if a utility app, such as a flashlight app, accesses a user’s contacts it should raise suspicion. Therefore, the behavior that makes an application potentially malicious is not a particular pattern, but the behavior in an application that is inconsistent with the end user’s expectation. The easiest starting point for identifying any questionable behavior is by looking at the App’s manifest file. The manifest file provides a high-level abstract of an app’s behaviorIn JADX, the AndroidManifest.xml is located in the Resources/ directory. The highest level of security for Android is the permission system that protects the usage of sensitive behavior. The manifest file shows us that the CoinPirates app has access to 14 permissions. Malware often abuses the text messaging permissions to communicate with their C&C server and to try and send premium text messages without the user being aware.456789101112131415Listing 9: Permissions used by CoinPiratesAfter observing the permissions, the next goal is to vet the application by analyzing how the application uses the sensitive APIs that are protected by the suspicious permissions. Since malware writers often repackage their payload within real apps with 100’s of classes, it would be too time-consuming to search through all the source code. Instead, we will focus on the entry points of the application.[2] Android applications are written using the Java programming language. Unlike conventional Java programs, Android applications do not have a main() function or a single entry point for execution. Instead, they are designed using components. App components make up the essential building blocks of an Android app. Each component is a different point through which the system can enter a developer’s application. There are four different types of components: activities, services, content providers, and broadcast receivers. Each type of component serves a different role and the set of components used in an Android application define its overall behavior. The activity component creates user interfaces. For example, a messaging application may have one activity that creates the user interface for allowing a user to input their message and another activity for allowing the user to view their contacts. The service component runs in the background to perform tasks. Unlike, activity components, service components do not have a user interface. For example, a service component can be used to play music in the background. The content provider component handles application data. Using content providers, an application can store data in files, SQLite databases, or other persistent storage locations an application can access. The broadcast receiver component responds to system-wide broadcast announcements. For example, the system may broadcast that a picture has been captured, and the broadcast receiver can alert the application of this action. In general, broadcast receivers do minimal work, but instead, alert other components that an event occurred.Since the components are required to be declared in the manifest, this allows us to quickly identify any interesting entry points without having to search through the source code. To avoid detection, malware usually does not trigger until it receives commands from its C&C server. The two most common and efficient wants for this communication is through the network and sms. Since SMS can provide communication when the user does not have a wifi connection, it is usually preferred. Since this app has declared the RECEIVE SMS permission, we know that it has the ability to receive broadcasts about arriving text messages through a broadcast receiver. If a broadcast receiver wants to receive a text message, it must specify that it can handle this action by adding the action to its intent filter inside the manifest file. The action required is shown in Listing 10.1Listing 10: Action required to receive SMS broadcastsIn the CoinPirates manifest, we see that only one receiver has this ability, and the component’s declaration provides us with enough information to identify the package and class name that declares the receiver. Additionally, the components declaration raises more suspicion. First, it is manipulating the naming convention and is located in the com.android package. Next, it has a priority of 10000. In Android, broadcasts can be ordered or sent to all apps at the same time. In general, applications with a higher priority will receive the broadcast first. Additionally, they have the choice of aborting the broadcast() or allowing it to be sent to the app with the next highest priority. Therefore, this behavior can be manipulated by malicious apps to hide the notification of received text messages[3]456Listing 11: Action required to receive SMS broadcastsIf we use JADX to analyze the source code for the SMSReceiver class, we can identify any suspicious behavior that may occur when a text message is received. The Android OS notifies broadcast receivers by calling the receiver’s onReceive method. Therefore, we should start our analysis from this point in the app. When looking over the source code of the onReceive method, we see that the method immediately queries a database called “mydb.” The source code also shows us that the values received from the database are being compared to the sender’s number and the contents of the sms body. Based on thee results of these comparisons, the app uses the needDel (delete text message) or needUpload variables to control the apps’ control-flow.Identifying suspicious entry point that are defined in the manifest file, allows us to quickly identify suspicious behavior. For example, After analyzing the SMSReceiver we see that it is being used by the C&C server to trigger malicious behavior. We also know that the app uses the “mydb” database to interpret the C&C servers commands. While the SMSReceiver app provides the most insight, the malicious app is also using two other receivers, AlarmReceiver and BootReceiver, to start the Monitor Service. We leaving analyzing the MonitorService component to the reader.Using static analysis, we can identify the necessary events required to trigger malicious behavior in the app. Our next goal will be to leverage the details we extracted from the static analysis to dynamically generate the malicious behavior at run time.In the case that the events necessary to trigger the malicious behavior is dependent on external sources, such as a text message being received, we will need to simulate these events. Android provides several tools for injecting events into the emulator, and you can read the full documentation on the Developer’s Website [7]. One tool is the emulator console. Each running emulator instance provides a console that lets you query and control the emulated device environment. For example, you can use the console to manage port redirection, network characteristics, and telephony events while your application is running on the emulator. The console emulator will be useful for injecting events, such as text messages from a specific number or changing the location’s device. The official documentation provides several examples.A developer can provide an app with resources by placing it in a specific subdirectory of the res/ folder. Once you provide a resource in your application, you can use it by referencing its resource ID. Each resource is grouped into a ”type“ such as string, layout, or drawable.When viewing an APK in JADX, you can find the resources an app uses in the Resources directory under the resources.arsc tab. After expanding the resources.arsc file, you can find many basic resources, such as hardcoded strings found in the values directory.When JADX decompiles the APK back into source code, resources will be referenced by their ID in the R class, you can use this to create a mapping from the Resource ID to its original name in the res/resources.arsc/values subdirectory. 123Using SMS as a protocol for a C&C server is an important design decision that is different from traditional IP-based approaches known from infected PCs. The main advantages of an SMS-based approach instead of IP-based are the fact that it does not require steady connections, that SMS is ubiquitous, and that SMS can accommodate offline bots easily [8]. sms.apk is leveraging SMS to receive commands from its C&C server, you need to identify them.At this point we should have enough information to trigger the malicious behavior. The C&C server can be started by running ./start server from the command line. Start the server and send the necessary text messages. Unfortunately, no malicious behavior will be exhibited. This is because the malicious app has placed anti-analysis techniques into the app to prevent analysis. Our next goal will be to find them and see if we can emulate these triggers or remove them.The Android/BadAccents malware, discussed in [8], contains two specific checks on the incoming SMS number. It checks for ‘84’ and ‘82’ numbers, which indicates that the malware expects SMS from a C&C SMS server either located in China or South Korea. It seems the app we are inspecting does something similar.From Stage 1, we know the required country code and the necessary commands to trigger the malicious behavior. However, even if we send the correct commands with the correct country code, sms.apk will still not exhibit any malicious behavior. In order to maximize the longevity of malware, malicious developers want to prevent analysis. Since the majority of dynamic analysis frameworks are based on emulation, malicious developers integrate anti-analysis techniques to change an app’s behavior. If an app senses that the underlying environment is an emulator and not a real phone, it will change its behavior to not exhibit any suspicious behavior. In Stage 2 we will try to identify how sms.apk is checking if it is on an emulator. Then we will modify sms.apk to remove this check and trigger the malicious behavior.The most basic form of emulation detection is when a malicious app leverages a static heuristic. Static heuristics are pre-initialized values that provide information about the underlying environment [9]. Apps running on a system can check these static heuristics by calling Android APIs. For many of the values, the emulator will return values that are inconsistent with what would happen if the app was running on an real device. For example, if the TelephonyManager.getDeviceId() API returns all 0’s, the device in question is an emulator. This is because this value cannot exist on a physical device.A list of the possible static heuristics that can be found in sms.apk can be found in [10]. However, the one just mentioned would be a good starting point.The final question is a two-step process. The first step will be to modify sms.apk and remove the environment check so that we can run sms.apk on an emulator. The second step will be sending the commands found in Stage 1 to the emulator and having it exhibit malicious behavior. Upon success, the C&C server will generate the final answers.4.6.3 Step 1:4.6.4 Step 2://developer.android.com/studio/run/emulator-commandline.html# events.[1] Portions of this section are reproduced from work created and shared by the Android Open Source Project and used according to terms described in the Creative Commons 2.5 Attribution License.[2] Portions of this section are reproduced from work created and shared by the Android Open Source Project and used according to terms described in the Creative Commons 2.5 Attribution License.[3] As of Android 4.4 this has been slightly adjusted. The default SMS app will always receive the broadcast first, regardless of priority.
Reviews
There are no reviews yet.