C to Assembly Translation
C to assembly: function calling
Even though most programming is now carried out in high level languages, a good understanding of the generated assembly code really helps in debugging, performance analysis and performance tuning.
Here we present a series of articles describing C to assembly translation. We will be mapping C code to pseudo-assembly. The concepts learnt here can easily be applied to understand the generated code for any real processor assembler.
In this article, we will discuss the assembly code generated for function calling, parameter passing and local variable management. Before we go any further we need to discuss a few things about the pseudo-assembler.
Pseudo assembler basics
- Processor registers are designated as R0, R1, etc.
- The MOVE instruction has the source on the left side and destination on the right side.
- Register RETURN_VALUE_REGISTER is used to return values to the calling function.
- The stack in the pseudo-processor grows from higher address to lower address. Thus a push results in a decrement to the stack pointer. A pop results in an increment to the stack pointer.
- Register STACK_POINTER is used to point the stack.
- Register FRAME_POINTER is used as the frame pointer. The frame pointer serves as an anchor between the called and the calling function.
- When a function is called, the function first saves the current value of the FRAME_POINTER on the stack. It then saves the value of the STACK_POINTER register in FRAME_POINTER register. This is followed by decrements the STACK_POINTER register to allocate space for local variables.
- The FRAME_POINTER register is used to access local variables and parameters. Local variables are located at a negative offset to the frame pointer. Parameters passed to the function are located at a positive offset to the frame pointer.
- When the function returns, the FRAME_POINTER register is copied into the STACK_POINTER register. This frees up the stack used for local variables. The value of FRAME_POINTER register for the caller of this function is restored from the stack by a pop.
Function calling
The following block shows the C code and the corresponding generated assembly code.
The generated assembly code is shown along with the corresponding C code.
Pseudo Assembler Code
Function calling sequence
The generated assembly code is best understood by tracing through the invocation of CalledFunction() from CallingFunction().
Pushing parameters
CallingFunction() pushes values 2 followed by 1 on the stack. These values correspond to param2 and param1 respectively. (Note that pushing order is reverse of the declaration order.). This is implemented by the PUSH instruction. The PUSH instruction pre-decrements the STACK_POINTER register and then copies the value to the address pointed to by the STACK_POINTER.
Address | Stack contents | Pointing Registers | Notes |
---|---|---|---|
0x00010020 | 2 | Second parameter passed to CalledFunction | |
0x0001001C | 1 | STACK_POINTER | First parameter passed to CalledFunction |
Invoke function
CallingFunction() invokes the CalledFunction() by the CALL_SUBROUTINE instruction. CALL_SUBROUTINE pushes the return address on the stack and transfers control to CalledFunction().
Address | Stack contents | Pointing Registers | Notes |
---|---|---|---|
0x00010020 | 2 | Second parameter passed to CalledFunction | |
0x0001001C | 1 | First parameter passed to CalledFunction | |
0x00010018 | Return address into CallingFunction() | Address of the next instruction in CallingFunction that should be executed when CallingFunction returns |
Setup the frame pointer and allocate space for local variables
CalledFunction() sets up the stack after invocation. This involves allocating space for local variables and setting up the frame pointer:
- Saves the CallingFunction()'s FRAME_POINTER register on the stack with the PUSH statement.
- Copies the STACK_POINTER register into the FRAME_POINTER register.
- Decrements the stack pointer by 8 to create space for the local variables local1 and local2.
Address | Stack contents | Pointing Registers | Notes |
---|---|---|---|
0x00010020 | param2 (2) | Second parameter passed to CalledFunction | |
0x0001001C | param1 (1) | First parameter passed to CalledFunction | |
0x00010018 | Return address into CallingFunction() | Address of the next instruction in CallingFunction that should be executed when CallingFunction returns | |
0x00010014 | FRAME_POINTER register of the CallingFunction() | FRAME_POINTER | The frame pointer of the CalledFunction has been pushed on the stack. The STACK_POINTER is then copied into the FRAME_POINTER register. This defines the frame pointer for the CalledFunction. |
0x00010010 | local1 | Space allocated to local1 variable | |
0x0001000C | local2 | STACK_POINTER | Space allocated to local2 variable |
Accessing parameters and local variables with frame pointer offsets
Code in the CalledFunction() accesses passed parameters by taking positive offsets from the frame pointer. Local variables are accessed by taking negative offsets from the frame pointer. The example presented here shows the code for param2 assignment to local1.
Address | Frame pointer relative addressing | Stack contents | Pointing Registers | Notes |
---|---|---|---|---|
0x00010020 | FRAME_POINTER+12 | param2 (2) | Second parameter passed to CalledFunction | |
0x0001001C | FRAME_POINTER+8 | param1 (1) | First parameter passed to CalledFunction | |
0x00010018 | Return address into CallingFunction() | Address of the next instruction in CallingFunction that should be executed when CallingFunction returns | ||
0x00010014 | FRAME_POINTER register of the CallingFunction() | FRAME_POINTER | The frame pointer of the CalledFunction has been pushed on the stack. The STACK_POINTER is then copied into the FRAME_POINTER register. This defines the frame pointer for the CalledFunction. | |
0x00010010 | FRAME_POINTER-4 | local1 | Space allocated to local1 variable | |
0x0001000C | FRAME_POINTER-8 | local2 | STACK_POINTER | Space allocated to local2 variable |
Free local variables from stack and restore the caller's frame pointer
Before the function returns, the stack setup at the start of the function has to be undone. This is accomplished by the following steps:
- Copy the FRAME_POINTER register into the STACK_POINTER register. This will free the stack entries allocated for local variables local1 and local2.
- Pop the saved frame pointer from the stack. (This will make sure that the CallingFunction() gets its original frame pointer value on return).
Address | Stack contents | Pointing Registers | Notes |
---|---|---|---|
0x00010020 | 2 | Second parameter passed to CalledFunction | |
0x0001001C | 1 | First parameter passed to CalledFunction | |
0x00010018 | Return address into CallingFunction() | Address of the next instruction in CallingFunction that should be executed when CallingFunction returns |
Return back to the caller
The processor now executes the RETURN_FROM_SUBROUTINE instruction. This instruction pops the return address from the stack and transfers control to the CallingFunction() at this address.
Address | Stack contents | Pointing Registers | Notes |
---|---|---|---|
0x00010020 | 2 | Second parameter passed to CalledFunction | |
0x0001001C | 1 | First parameter passed to CalledFunction |
Caller pops parameters
The CallingFunction() now pops the parameters that were passed to the CalledFunction(). This is done by adding 8 to the stack pointer.
Address | Stack contents | Pointing Registers | Notes |
---|
C to assembly: loops, structs and arrays
We have covered the C calling convention, frame pointers and the assembly code in the previous article. This article will focus on the code generation for:
Code generation for a "while" loop
The following example shows the code generation for a simple while loop. Also note that the function shown below does not use a frame pointer as this function does not have local variables. Since the FRAME_POINTER register is not used, parameter access is carried out by directly taking offsets from the STACK_POINTER register.
Code generation for a while loop
Code generation for a "for" loop
Code generation for the for loop is covered in the example given below.
Code generation for a for loop
Code generation for structure access
The code generation for C structure access is covered here. The example shows the filling of a message structure. This function does not have LINK and UNLK as the local variable p_msg has been assigned to a register, so no space needs to be allocated for local variables on the stack.
Code generation for structure access
Code generation for array indexing
The code below shows an instance of array indexing. The generated code is very inefficient because it leads to a multiply by structure size. This overhead can also be reduced by making the size of the structure a power of 2, i.e. 2, 4, 8, 16 etc. In such cases the compiler would replace the multiply with a shift instruction.
Code generation for array indexing
Most compilers will optimize the above code by directly incrementing the pointer in a loop. The optimized code and the generated assembly code are shown below. This optimization really speeds up array indexing in a loop as multiply/shifts are avoided.
Code generation for array indexing (optimized)
C to assembly: if and switch statements
Code generation for "if-else" statement
Code generation for an if-else statement is straight forward. The assembly code exactly mirrors the C code.
Code generation for if-else statement
Code generation for switch statement
The code generated for a switch statement varies a lot from one compiler to another. In fact, a given compiler might generate different code in different scenarios. The choice of the code to be generated depends upon the number and range spread of individual case statements.
Different cases of generation of a switch statement are:
- Case values are in a narrow range
- Case values are distributed over a wide range
- Number of Case values is large and they are distributed over a wide range
Case values in narrow range
If the case values are placed in a narrow range, the compiler can avoid performing a comparison for every case leg in the switch statement. In such cases, the compiler generates a jump table which contains addresses of the actions to be taken on different legs. The value on which the switch is being performed is manipulated to convert it into an index into the jump table. In this implementation, the time taken in the switch statement is much less than the time taken in an equivalent if-else-if statement cascade. Also, the time taken in the switch statement is independent of the number of case legs in the switch statement.
Code generation for a switch statement (case values are in a narrow range)
Case values in wide range
If the case legs of the switch statement have a wide deviation in values, the compiler cannot make a jump table to handle the switch statement. In such cases, the jump table would be huge in size and filled very sparingly. Thus the compiler resorts to using a cascade of comparisons to implement the switch. The code generated for the switch statement in such cases will look more like a series of if-else-if statements. Here the time taken to execute the switch statement increases with the number of case legs in the switch.
Generation for switch statement (case values are in a wide range)
Big switch statement with wide distribution
If the switch statement has a very large number of case legs and the values are widely distributed, some compilers use binary search to select the case leg. The different case values are sorted by the compiler at compile time for a binary search.