Assignment title: Information

 Submissions must include your source file as a .c/cpp along with a PDF  Late work will not be accepted This assignment will reinforce your knowledge of the assembly process. You will need to go through all of the steps of converting an assembly source file to object code. Your goal is to write a two-pass assembler for a subset of the MIPS instruction set. It should be able to read an assembly file from the command line and write the object code to standard output. You can make the following assumptions:  The code segment will precede the data segment  The source file will contain no more than 32768 distinct instructions  The source file will define no more than 32768B of data  The source file will not contain comments  There will be no whitespace between arguments in each instruction  Each line may have a symbolic label, terminated with a colon Table 1 provides a list of the assembly directives that your assembler must recognize. Table 2 provides a list of the instructions that your assembler must recognize. Be sure that you note the arguments for each instruction. It may be helpful to refer to Appendix A.10 when writing your parser. o Use the filename .pr01. o Do not submit an archive (no tar/zip) Table 1. List of Assembly Directives Directive Explanation .text Place items following this directive in the user text segment .data Place items following this directive in the data segment .word w1,w2,...,wn Store n 32b integer values in successive words in memory .space n Allocate n bytes of space in memory, initialized to zero Mnemonic Format Args Descriptions Table 2. List of MIPS Instructions addiu I Add immediate with no overflow addu R 3 (rd, rs, rt) Add with no overflow and R 3 (rd, rs, rt) Bitwise logical AND beq I Branch when equal bne I Branch when not equal div R 2 (rs, rt) Signed integer divide j J Jump lw I Load 32b word mfhi R 1 (rd) Move from hi register mflo R 1 (rd) Move from low register mult R 2 (rs, rt) Signed integer multiply or R 3 (rd, rs, rt) Bitwise logical OR slt R 3 (rd, rs, rt) Set when less than subu R 3 (rd, rs, rt) Subtract with no overflow sw I Store 32b word syscall R 0 System call Summer 2016 CDA3101: Computer Organization II In addition to the instructions above, your assembler must be able to resolve symbolic labels. These labels may be targets used for changes in the control flow (branch or jump instructions) or as names for memory elements. The way labels are handled differs depending on their usage. Targets for branch instructions should be referenced as the location of the target in memory relative to the current instruction (remember that the PC points to the next instruction). For example, consider the code below: 00400400 : 400400: 1100000c beqz t0,400434 400404: 00000000 nop 400408: 01084021 addu t0,t0,t0 40040c: 1100fffc beqz t0,400400 400410: 00000000 nop 400414: 01084021 addu t0,t0,t0 400418: 1100fff9 beqz t0,400400 40041c: 00000000 nop 400420: 01084021 addu t0,t0,t0 400424: 1100fff6 beqz t0,400400 400428: 00000000 nop 40042c: 11000001 beqz t0,400434 400430: 00000000 nop 00400434 : 400434: 00000000 nop You can see that the forward branches to L5 (in pink) have distances of 12 and 1. If you count the instructions from the two branch instructions, you can see that the actual numbers of instructions are 13 and 2 – the PC will have already advanced to the next instruction. The same is true for the backward branches to L4 (the non-colored branches). The branches use two's complement for the target calculations, so the first branch, 0x1100fffc, is at an offset of 0xfffc from the target. If you calculate the decimal value, you should get -4, which is the distance of the label from the PC. Targets for jump instructions should use the absolute location of the target. For example, assume that label L1 is located in memory at 0x400370. The instruction j L1 will resolve to j 400370. Data labels should be referenced by their offset from the global pointer, $gp, which is assumed to point to the start of the data segment. You should use the linprog servers for all of your compilation and testing. Your output should match mine exactly. You can determine if the results are identical by calculating the md5sum or by using diff. You must use C/C++ as your language and your solution should be a single file (e.g. ch03c.pr01.c or ch03c.pr01.cpp). You should submit this file through Blackboard. Your program should have comments inline and a header at the top. For example: /** * @file main.cpp * @author hughes <>, (C) 2014, 2015, 2016 * @date 05/11/16 * @brief Simple MIPS assembler * * @section DESCRIPTION * This program implements an assembler for a subset * of the MIPS assembly language. Can compile with debug * by including –DDEBUG in the compiler options. ************************************************************/ Please test your output against the results from the sample binary before submission. The test script uses md5 and diff to compare your output with the baseline. Your submissions will also be processed for plagiarism. The script will use the following for compilation: g++ -Werror -mtune=generic -O0 -std=c++11 Summer 2016 CDA3101: Computer Organization II If you write it in C instead of C++, the script will use gcc -Werror -mtune=generic -O0 -std=c11 You can access my binary using the following command: There is an example assembly program below in Figure 1 along with the machine code. You can access the assembly source at ~chughes/cda3101/test01.s and the object code at ~chughes/cda3101/test01.obj. You should note that the machine code is in hexadecimal. .text addu $s0,$zero,$zero addu $s1,$zero,$zero addiu $v0,$zero,5 syscall sw $v0,n($gp) L1: lw $s2,n($gp) slt $t0,$s1,$s2 beq $t0,$zero,L2 addiu $v0,$zero,5 syscall addu $s0,$s0,$v0 addiu $s1,$s1,1 j L1 L2: addu $a0,$s0,$zero addiu $v0,$zero,1 syscall addiu $v0,$zero,10 syscall .data n: .word 0 m: .word 1,9,12 q: .space 10 ~chughes/cda3101/assembler 00008021 00008821 24020005 0000000c af820000 8f920000 0232402a 11000005 24020005 0000000c 02028021 26310001 08000005 02002021 24020001 0000000c 2402000a 0000000c 00000000 00000001 00000009 0000000c 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Figure 1 – Sample source code (left) and object code (right) A second test file is included in the directory and is named test02.s. These are samples and are not the inputs that will be used for grading. Feel free to write your own inputs and share them via the discussion boards. If you find an error in assembler, please let me know (extra credit)! While you are free to use any string parsing method you choose, you may find it helpful to use the getline function. getline extracts characters from an input stream and stores them in a string until a delimiter is reached or a newline character is found. istream& getline (istream& is, string& str); Summer 2016 CDA3101: Computer Organization II For example, the code below discards whitespace at the current pointer, reads a line from the input, and pushes the line to a list as a string type. do { std::ws(asmFile); std::getline(asmFile, lineIn); sourceCode.push_back(lineIn); //add to the list of instructions from source }while(asmFile.eof() == 0); You may also find the Boost tokenizer class useful. The tokenizer will parse the input sequence and break the sequence into pieces, depending on a delimiter. The code below takes an input string, input, and seperates it based on the characters defined in delimeter. The for-loop then iterates through those tokens. boost::char_separator delimeter(", ()"); boost::tokenizer< boost::char_separator< char > > tokens(input, delimeter); for(boost::tokenizer< boost::char_separator >::iterator it = tokens.begin(); it != tokens.end(); it++) { //stuff } These are just some of the tools that I used in my solution; you are not required to use them! C/C++ has plenty of functions that you may find useful such as fgets and sscanf. Be creative! Grading Rubric Build 5 Test asm02.s asm01.s asm03.s 16 24 24 Engineering Requirements Coding standard 10 5