C to Assembly Translation

C to assembly: function calling

Frame pointer operations in a C function call

Even though most programming is now carried out in high level languages, a good understanding of the generated assembly code really helps in debugging, performance analysis and performance tuning.

Here we present a series of articles describing C to assembly translation. We will be mapping C code to pseudo-assembly. The concepts learnt here can easily be applied to understand the generated code for any real processor assembler.

In this article, we will discuss the assembly code generated for function calling, parameter passing and local variable management. Before we go any further we need to discuss a few things about the pseudo-assembler.

Pseudo assembler basics

Function calling

The following block shows the C code and the corresponding generated assembly code.

C Code

The generated assembly code is shown along with the corresponding C code.

Pseudo Assembler Code

Function calling sequence

The generated assembly code is best understood by tracing through the invocation of CalledFunction() from CallingFunction().

Pushing parameters

CallingFunction() pushes values 2 followed by 1 on the stack. These values correspond to param2 and param1 respectively. (Note that pushing order is reverse of the declaration order.). This is implemented by the PUSH instruction. The PUSH instruction pre-decrements the STACK_POINTER register and then copies the value to the address pointed to by the STACK_POINTER.

Address Stack contents Pointing Registers Notes
0x00010020 2 Second parameter passed to CalledFunction
0x0001001C 1 STACK_POINTER First parameter passed to CalledFunction

Invoke function

CallingFunction() invokes the CalledFunction() by the CALL_SUBROUTINE instruction. CALL_SUBROUTINE pushes the return address on the stack and transfers control to CalledFunction().

Address Stack contents Pointing Registers Notes
0x00010020 2   Second parameter passed to CalledFunction
0x0001001C 1   First parameter passed to CalledFunction
0x00010018 Return address into CallingFunction() Address of the next instruction in CallingFunction that should be executed when CallingFunction returns

Setup the frame pointer and allocate space for local variables

CalledFunction() sets up the stack after invocation. This involves allocating space for local variables and setting up the frame pointer:

Address Stack contents Pointing Registers Notes
0x00010020 param2 (2)   Second parameter passed to CalledFunction
0x0001001C param1 (1)   First parameter passed to CalledFunction
0x00010018 Return address into CallingFunction()   Address of the next instruction in CallingFunction that should be executed when CallingFunction returns
0x00010014 FRAME_POINTER register of the CallingFunction() FRAME_POINTER The frame pointer of the CalledFunction has been pushed on the stack. The STACK_POINTER is then copied into the FRAME_POINTER register. This defines the frame pointer for the CalledFunction.
0x00010010 local1 Space allocated to local1 variable
0x0001000C local2 STACK_POINTER Space allocated to local2 variable

Accessing parameters and local variables with frame pointer offsets

Code in the CalledFunction() accesses passed parameters by taking positive offsets from the frame pointer. Local variables are accessed by taking negative offsets from the frame pointer. The example presented here shows the code for param2 assignment to local1.

Address Frame pointer relative addressing Stack contents Pointing Registers Notes
0x00010020 FRAME_POINTER+12 param2 (2)   Second parameter passed to CalledFunction
0x0001001C FRAME_POINTER+8 param1 (1)   First parameter passed to CalledFunction
0x00010018 Return address into CallingFunction()   Address of the next instruction in CallingFunction that should be executed when CallingFunction returns
0x00010014 FRAME_POINTER register of the CallingFunction() FRAME_POINTER The frame pointer of the CalledFunction has been pushed on the stack. The STACK_POINTER is then copied into the FRAME_POINTER register. This defines the frame pointer for the CalledFunction.
0x00010010 FRAME_POINTER-4 local1   Space allocated to local1 variable
0x0001000C FRAME_POINTER-8 local2 STACK_POINTER Space allocated to local2 variable

Free local variables from stack and restore the caller's frame pointer

Before the function returns, the stack setup at the start of the function has to be undone. This is accomplished by the following steps:

Address Stack contents Pointing Registers Notes
0x00010020 2   Second parameter passed to CalledFunction
0x0001001C 1   First parameter passed to CalledFunction
0x00010018 Return address into CallingFunction()   Address of the next instruction in CallingFunction that should be executed when CallingFunction returns

Return back to the caller

The processor now executes the RETURN_FROM_SUBROUTINE instruction. This instruction pops the return address from the stack and transfers control to the CallingFunction() at this address.

Address Stack contents Pointing Registers Notes
0x00010020 2   Second parameter passed to CalledFunction
0x0001001C 1   First parameter passed to CalledFunction

Caller pops parameters

The CallingFunction() now pops the parameters that were passed to the CalledFunction(). This is done by adding 8 to the stack pointer.

Address Stack contents Pointing Registers Notes

C to assembly: loops, structs and arrays

We have covered the C calling convention, frame pointers and the assembly code in the previous article. This article will focus on the code generation for:

C to assembly for loops, structure access and array indexing

Code generation for a "while" loop

The following example shows the code generation for a simple while loop. Also note that the function shown below does not use a frame pointer as this function does not have local variables. Since the FRAME_POINTER register is not used, parameter access is carried out by directly taking offsets from the STACK_POINTER register.

Code generation for a while loop

Code generation for a "for" loop

Code generation for the for loop is covered in the example given below.

Code generation for a for loop

Code generation for structure access

The code generation for C structure access is covered here. The example shows the filling of a message structure. This function does not have LINK and UNLK as the local variable p_msg has been assigned to a register, so no space needs to be allocated for local variables on the stack.

Code generation for structure access

Code generation for array indexing

The code below shows an instance of array indexing. The generated code is very inefficient because it leads to a multiply by structure size. This overhead can also be reduced by making the size of the structure a power of 2, i.e. 2, 4, 8, 16 etc. In such cases the compiler would replace the multiply with a shift instruction.

Code generation for array indexing

Most compilers will optimize the above code by directly incrementing the pointer in a loop. The optimized code and the generated assembly code are shown below. This optimization really speeds up array indexing in a loop as multiply/shifts are avoided.

Code generation for array indexing (optimized)

C to assembly: if and switch statements

Code generation for "if-else" statement

Code generation for an if-else statement is straight forward. The assembly code exactly mirrors the C code.

Code generation for if-else statement

Code generation for switch statement

The code generated for a switch statement varies a lot from one compiler to another. In fact, a given compiler might generate different code in different scenarios. The choice of the code to be generated depends upon the number and range spread of individual case statements.

Different cases of generation of a switch statement are:

Case values in narrow range

If the case values are placed in a narrow range, the compiler can avoid performing a comparison for every case leg in the switch statement. In such cases, the compiler generates a jump table which contains addresses of the actions to be taken on different legs. The value on which the switch is being performed is manipulated to convert it into an index into the jump table. In this implementation, the time taken in the switch statement is much less than the time taken in an equivalent if-else-if statement cascade. Also, the time taken in the switch statement is independent of the number of case legs in the switch statement.

Switch jump table

Code generation for a switch statement (case values are in a narrow range)

Case values in wide range

If the case legs of the switch statement have a wide deviation in values, the compiler cannot make a jump table to handle the switch statement. In such cases, the jump table would be huge in size and filled very sparingly. Thus the compiler resorts to using a cascade of comparisons to implement the switch. The code generated for the switch statement in such cases will look more like a series of if-else-if statements. Here the time taken to execute the switch statement increases with the number of case legs in the switch.

Generation for switch statement (case values are in a wide range)

Big switch statement with wide distribution

If the switch statement has a very large number of case legs and the values are widely distributed, some compilers use binary search to select the case leg. The different case values are sorted by the compiler at compile time for a binary search.

Explore more