Assembly
This section will be using x86
assembly.
Assembly language comprises instructions that are sent to a CPU. There are multiple "variations" of assembly, catering to different CPU architectures.
Assembler
-
A program that translates assembly code into machine code.
Disassembler
-
A program that translates machine code into assembly code.
For example, IDA Pro.
Register
-
CPU's basic unit of storage. Registers are fast, but limited in size.
Control unit
-
The part of the CPU that executes instructions.
Control unit gets instructions to execute from RAM through the Instruction Pointer.
Instruction Pointer
-
A register that points to the next instruction to be executed.
Memory
Memory can be split into different parts, namely Stack, Heap, Code and Data. These may seem contiguous in the diagram below, but they are actually not in order and scattered in memory.
Tip
Memory addresses always go from high to low.
Stack
The stack is a region of memory that is used to store temporary data(local variables and parameters for functions). Used to control program flow as function, APIs and subroutines are called here.
Tip
Functions vs Subroutines - the difference is that functions are used when a value is needed to be returned, while subroutines are used when a desired task is needed, but no return values are needed.
Heap
The heap is a region of memory that is used to store data that is allocated dynamically (Content change freqently when program is running, also meaning that it constantly allocates new values and free unwanted values during execution or run time).
Code
The code is a region of memory that is used to store instructions to be executed.
Data
The data is a region of memory that is used to store static data.
Operands
Immediate operands
Fixed values
Register operands
Registers (e.g. ecx
)
Tip
EDX can be used for division
EAX can be used for multiplication
EAX can also hold return value for function call
ESP, EBP used for function call/return
ESI, EDI, ECX are used in repeat instructions
Index registers (ESI, EDI) may store memory addresses
All registers above have backward compatibility. This is because the x86 architecture extends the previously 16 bits and 8 bit processing.
💫 x64
architecture extends the x86
we are currently learning. It has 64-bit registers, replacing the 'e' with 'r'. Hence, registers are called rax, rbx, rcx ...etc
Opcode
Opcode is the machine language equivalent of an assembly instruction
Memory adress
e.g. [ecx]
Status Flags
EFLAGS
, which are 32 bits. Each bit is a flag with value 0 (clear) or 1 (set). Flags used to control CPU operations or indicate results. Important ones have been listed here:
Flags | Description |
---|---|
Zero Flag (ZF) |
Set when operation result = 0 |
Cary Flag (CF) |
Set when operation result cannot be stored (results out of the range of a a byte typically) |
Sign Flag (SF) |
Set when operation result is negative or when MSB set after arithmetic oepration |
Trap Flag (TF) |
Set to debug, causing CPU to single step. |
Overflow Flag (OF) |
Set when operation result generates invalid signed results |
Instruction Pointer, EIP
EIP
is 32 bits. It stores the address of next instruction to execute. When you control EIP
, you can control what is executed by the CPU. If attackers have maliicous code/malware in memory, then they simply can modify EIP
to point to that code to exploit a system.
Data Allocation
Tip
Anything that follows a ';' is a comment and is ignored by the assembler
Multiple definitions can also be abbreviated.
References to certain values stored goes like this: When Z above is called, he value returned would then be 1, while Z + 4 will be 2.
DUP
DUP
initializes an array of specifiec integers/bytes.
(e.g. 10 DUP (0)
initializes an array of 10 elements, all initalized to 0. The result of this would be like this: 0,0,0,0,0,0,0,0,0,0)
EQU
EQU
assigns the result of expression to name. The expression is evalutaed at assembly time.
(e.g. The expression 50, is assigned to the name NUM_OF_ROWS below)
Correspondenc to C data types
Program Layout
Assembly Instructions
Move , mov
Copies a value specified or the value stored at a specified address into the destination.
Load Effective Address , lea
Copies value of address into the destination
Arithmetic Instructions
Logical/Shifting Instructions
Each Logical and Shifting Instructions have their purposes. Generally, they are :
Instruction | Description |
---|---|
xor |
Used to clears registers, and specify which bits to change |
or |
Used to set a certain bit |
sh (shift) |
Used for fast multiplication |
ro (rotate) |
Used for fast division |
NOP and INT
Conditionals
Program execution depends on comparison result (Changes in status flags - specific bits may be set or cleared). The following are instructions that affect status flags.
AND
OR
XOR
NOT
test
Performs a nondestructive AND operation between each pair of matching bits in two operands.Only affects the ZF.
cmp
Compares destination and source.
Tip
You can imagine it as a CMP result: Destination - Source
cmp
with unsigned integers
cmp
with signed integers
Conditional Jumps
Branches to a label when specific register/flag conditions are met. Based on specific flags, equality, unsigned/signed comparisions.
Tip
You Parity Flag is used for error correciton. Counts for the number of set bits(bits that are 1) and if the count is even or odd.
Repeat Instructions
Repeat instructions are used for processing multi byte data like byte arrays. Uses ESI
(source index), EDI
(destination index) and ECX
(counting variable) registers. Registers must be properly initialized for repeat instructions to work.
ECX
decreases once one repeat has occured.
Instruction | Description |
---|---|
rep |
Based on the value stored in ECX , repeat for that number of times. |
repe |
Repeat until ECX = 0 |
repz |
Repeat until ZF = 0 |
repne |
Repeat while ECX != 0 |
repnz |
Repeat while ZF != 1 |
Examples
Stack
Stores memory for functions, local variables and flow control. The stack grows downwards and memory locations lower than the esp
should always be available, unless the stack has overflowed.
Push
Decrements the stack pointer by 4 bytes.
Copies a value into the location pointed to by the stack pointer, esp
.
Warning
Push can only be done on 16/32 bits register/memory addresses or 32 bit immediate operands(fixed values).
Pop
Increments the stack pointer by either 2 or 4 bytes. (depends on attribute of the operand receiving the data - is it a DD or DQ?) Copies value at location pointed to by the stack pointer into a register or variable.
Warning
Pop can only be done on 16/32 bits register/memory addresses.
Basic constructs
Recognizing the main method
Example
If Else
Loops
For Loop
While Loop
Switch
Struct
A complex data type declaration that defines a physically grouped list of variables to be placed under one name in a block of memory, allowing the different variables to be accessed via a single pointer, or the struct declared name which returns the same address.
Or simply a collection of variables (can be of different data types) under a single name. Structures are accessed with base address.
Stack Frame
Created: June 11, 2023