Submissions must include your source file as a .c/cpp along with a PDF
Late work will not be accepted
This assignment will reinforce your knowledge of the assembly process. You will need to go through all of the steps
of converting an assembly source file to object code.
Your goal is to write a two-pass assembler for a subset of the MIPS instruction set. It should be able to read an
assembly file from the command line and write the object code to standard output. You can make the following
assumptions:
The code segment will precede the data segment
The source file will contain no more than 32768 distinct instructions
The source file will define no more than 32768B of data
The source file will not contain comments
There will be no whitespace between arguments in each instruction
Each line may have a symbolic label, terminated with a colon
Table 1 provides a list of the assembly directives that your assembler must recognize. Table 2 provides a list of the
instructions that your assembler must recognize. Be sure that you note the arguments for each instruction. It may be
helpful to refer to Appendix A.10 when writing your parser.
o Use the filename .pr01.
o Do not submit an archive (no tar/zip)
Table 1. List of Assembly Directives
Directive Explanation
.text Place items following this directive in the user text segment
.data Place items following this directive in the data segment
.word w1,w2,...,wn Store n 32b integer values in successive words in memory
.space n Allocate n bytes of space in memory, initialized to zero
Mnemonic Format Args Descriptions
Table 2. List of MIPS Instructions
addiu I Add immediate with no overflow
addu R 3 (rd, rs, rt) Add with no overflow
and R 3 (rd, rs, rt) Bitwise logical AND
beq I Branch when equal
bne I Branch when not equal
div R 2 (rs, rt) Signed integer divide
j J Jump
lw I Load 32b word
mfhi R 1 (rd) Move from hi register
mflo R 1 (rd) Move from low register
mult R 2 (rs, rt) Signed integer multiply
or R 3 (rd, rs, rt) Bitwise logical OR
slt R 3 (rd, rs, rt) Set when less than
subu R 3 (rd, rs, rt) Subtract with no overflow
sw I Store 32b word
syscall R 0 System call
Summer 2016
CDA3101: Computer Organization II
In addition to the instructions above, your assembler must be able to resolve symbolic labels. These labels may be
targets used for changes in the control flow (branch or jump instructions) or as names for memory elements. The way
labels are handled differs depending on their usage. Targets for branch instructions should be referenced as the
location of the target in memory relative to the current instruction (remember that the PC points to the next
instruction). For example, consider the code below:
00400400 :
400400: 1100000c beqz t0,400434
400404: 00000000 nop
400408: 01084021 addu t0,t0,t0
40040c: 1100fffc beqz t0,400400
400410: 00000000 nop
400414: 01084021 addu t0,t0,t0
400418: 1100fff9 beqz t0,400400
40041c: 00000000 nop
400420: 01084021 addu t0,t0,t0
400424: 1100fff6 beqz t0,400400
400428: 00000000 nop
40042c: 11000001 beqz t0,400434
400430: 00000000 nop
00400434 :
400434: 00000000 nop
You can see that the forward branches to L5 (in pink) have distances of 12 and 1. If you count the instructions from
the two branch instructions, you can see that the actual numbers of instructions are 13 and 2 – the PC will have already
advanced to the next instruction. The same is true for the backward branches to L4 (the non-colored branches). The
branches use two's complement for the target calculations, so the first branch, 0x1100fffc, is at an offset of 0xfffc
from the target. If you calculate the decimal value, you should get -4, which is the distance of the label from the PC.
Targets for jump instructions should use the absolute location of the target. For example, assume that label L1 is
located in memory at 0x400370. The instruction j L1 will resolve to j 400370.
Data labels should be referenced by their offset from the global pointer, $gp, which is assumed to point to the start
of the data segment.
You should use the linprog servers for all of your compilation and testing. Your output should match mine exactly.
You can determine if the results are identical by calculating the md5sum or by using diff. You must use C/C++ as
your language and your solution should be a single file (e.g. ch03c.pr01.c or ch03c.pr01.cpp). You should submit this
file through Blackboard. Your program should have comments inline and a header at the top. For example:
/**
* @file main.cpp
* @author hughes <>, (C) 2014, 2015, 2016
* @date 05/11/16
* @brief Simple MIPS assembler
*
* @section DESCRIPTION
* This program implements an assembler for a subset
* of the MIPS assembly language. Can compile with debug
* by including –DDEBUG in the compiler options.
************************************************************/
Please test your output against the results from the sample binary before submission. The test script uses md5 and
diff to compare your output with the baseline. Your submissions will also be processed for plagiarism. The script will
use the following for compilation: g++ -Werror -mtune=generic -O0 -std=c++11
Summer 2016
CDA3101: Computer Organization II
If you write it in C instead of C++, the script will use gcc -Werror -mtune=generic -O0 -std=c11
You can access my binary using the following command:
There is an example assembly program below in Figure 1 along with the machine code. You can access the assembly
source at ~chughes/cda3101/test01.s and the object code at ~chughes/cda3101/test01.obj. You should
note that the machine code is in hexadecimal.
.text
addu $s0,$zero,$zero
addu $s1,$zero,$zero
addiu $v0,$zero,5
syscall
sw $v0,n($gp)
L1:
lw $s2,n($gp)
slt $t0,$s1,$s2
beq $t0,$zero,L2
addiu $v0,$zero,5
syscall
addu $s0,$s0,$v0
addiu $s1,$s1,1
j L1
L2:
addu $a0,$s0,$zero
addiu $v0,$zero,1
syscall
addiu $v0,$zero,10
syscall
.data
n: .word 0
m: .word 1,9,12
q: .space 10
~chughes/cda3101/assembler
00008021
00008821
24020005
0000000c
af820000
8f920000
0232402a
11000005
24020005
0000000c
02028021
26310001
08000005
02002021
24020001
0000000c
2402000a
0000000c
00000000
00000001
00000009
0000000c
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
Figure 1 – Sample source code (left) and object code (right)
A second test file is included in the directory and is named test02.s. These are samples and are not the inputs that will
be used for grading. Feel free to write your own inputs and share them via the discussion boards. If you find an error
in assembler, please let me know (extra credit)!
While you are free to use any string parsing method you choose, you may find it helpful to use the getline function.
getline extracts characters from an input stream and stores them in a string until a delimiter is reached or a newline
character is found.
istream& getline (istream& is, string& str);
Summer 2016
CDA3101: Computer Organization II
For example, the code below discards whitespace at the current pointer, reads a line from the input, and pushes the
line to a list as a string type.
do
{
std::ws(asmFile);
std::getline(asmFile, lineIn);
sourceCode.push_back(lineIn); //add to the list of instructions from source
}while(asmFile.eof() == 0);
You may also find the Boost tokenizer class useful. The tokenizer will parse the input sequence and break the
sequence into pieces, depending on a delimiter. The code below takes an input string, input, and seperates it based on
the characters defined in delimeter. The for-loop then iterates through those tokens.
boost::char_separator delimeter(", ()");
boost::tokenizer< boost::char_separator< char > > tokens(input, delimeter);
for(boost::tokenizer< boost::char_separator >::iterator it = tokens.begin();
it != tokens.end(); it++)
{
//stuff
}
These are just some of the tools that I used in my solution; you are not required to use them! C/C++ has plenty of
functions that you may find useful such as fgets and sscanf. Be creative!
Grading Rubric
Build 5
Test
asm02.s
asm01.s
asm03.s
16
24
24
Engineering
Requirements
Coding standard
10
5