Arduino Stack Painting

Painting brush

The stack is a data structure which operates in a last-in-first-out (LIFO) method. In a LIFO data structure, the last element added to the structure must be the first one to be removed. It is typical to visualize a stack as a stack of plates. You add or “push” new plates onto the top of the stack, and conversely remove, or “pop” plates off of the top.

The hardware maintains a pointer (called the “stack pointer”) to the current stack memory location. In the arduino, unlike a stack of plates, the stack starts at the end, or top of SRAM memory (a location referred to as RAMEND), and grows downward. Pushing an item on the stack actually decrements the stack pointer by the size of the item. The stack pointer always points at the last item on the stack.

The arduino stack is under the control of the hardware and the c-runtime code. It is rarely required to directly manipulate the stack, but it is possible and sometimes desired (especially if writing assembly language routines). Typically, some of the first instructions executed by your arduino program are a sequence similar to the following:

ldi  r28, lo8(RAMEND)
ldi  r29, hi8(RAMEND)
out  SPH, r29
out  SPL, r28

This sequence initializes the stack pointer, or SP. SP is a hardware register, when the stack size is zero, SP points to the origin of the stack. On the arduino, SP is a two byte structure, with SPL and SPH being the low and high bytes respectively. On a standard ATMega328 based arduino, the stack pointer is usually set to point to 0x8FF (0x4FF on the ATMega168).

When a byte is “pushed” onto the stack, it is placed in the memory location the Stack Pointer points to. Then the stack pointer is decremented by the size of the item that was “pushed”. In this manner, the arduino stack pointer actually points to the next (empty) location on the stack.

The basic purpose of the stack is to support function calls and interrupts. Whenever a program makes a function call or when an interrupt occurs, the stack is used to store critical information which will be restored upon completion of the function or interrupt.

First and primary, during a function call/interrupt, the hardware places the return address on the stack. Space is reserved for any local variables, and in the case of an interrupt, the c-runtime stores the Status Register (SREG), along with additional registers that maybe used during the ISR. Upon return from the interrupt or function, all of these values are restored. The order of the push and pop instructions is therefore very critical.

All of the pushing and popping before and after the function or interrupt is termed the “prologue” and “epilogue”. Below, I wrote a very basic interrupt routine that simply increments a byte so we can examine the pro/epilogue code generated by the compiler:

//here is an example ISR coded in C:
volatile uint8_t a;

ISR(INT0_vect) {
  a++;
}

//this is the generated assembly code:
;prologue
0000027F 1f.92                PUSH r1		;save r1 register
00000280 0f.92                PUSH r0 		;save r0 register
00000281 0f.b6                IN r0,SREG 	;get status register
00000282 0f.92                PUSH r0		;save sreg 
00000283 11.24                CLR r1		
00000284 8f.93                PUSH r24 		;save r24 register
;increment byte (a) here
00000285 80.91.c3.01          LDS r24,(a) 
00000287 8f.5f                SUBI r24,0xFF 	
00000288 80.93.c3.01          STS (a),r24 
;epilogue
0000028A 8f.91                POP r24		;restore r24 register
0000028B 0f.90                POP r0		;restore status register
0000028C 0f.be                OUT SREG,r0
0000028D 0f.90                POP r0		;restore r0 register
0000028E 1f.90                POP r1 		;restore r1 register
0000028F 18.95                RETI 		;return from interrupt

As you can see, the meat of the ISR is only 10 bytes long. However, together the pro/epilogue add another 24 bytes, for a total of 34. But, after close examination, several of the generated steps are really not required. The only register used inside the ISR is r24. Therefore, pushing r1 and r0, and the clearing r1 instructions are not necessary. Likewise then, popping r0 and r1 are not necessary either. We could easily trim a total of 10 bytes from this ISR. Thankfully, GCC has a provision that allows you to write your own pro/epilogues, see my post here.

Note in the above example, you do not see the program counter pushed or popped on the stack. During a function call or interrupt the push/pop of the return address is inherent in the call/ret instructions.

Here is an example of how the compiler uses the stack to store local variables inside of a function. This is sometimes referred to as “setting up a stack frame.” We will reserve 16 bytes for a character array (note: unrelated code has been removed for the purpose of clarity). The compiler performs all of this stack manipulation for us behind the scenes, so-to-speak:

void example(void) {
  char buffer[16]; //space will be reserved on the stack

  //
  //do something here. . .
  //

}

//result in this machine code:

;prologue
  PUSH r28         ;save registers on stack 
  PUSH r29 
  IN   r28,SPL     ;get stack pointer	 
  IN   r29,SPH	 
  SBIW r28,16      ;reserve 16 bytes space on stack
  OUT  SPH,r29     ;update new stack pointer
  OUT  SPL,r28 

;
;do something here. . .
;

;epilogue
  ADIW r28,16      ;remove the 16 bytes from the stack
  OUT  SPH,r29     ;restore stack pointer
  OUT  SPL,r28 
  POP  r29         ;restore registers from stack
  POP  r28 
  RET 
}

At anytime, the current size of the stack can be determined simply by subtracting the stack pointer (SP) from RAMEND, like this:

void setup() {
  Serial.begin(9600);
  Serial.println((RAMEND - SP), HEX);
}

void loop() { } 

However, consider that the stack is a dynamic structure which grows and shrinks as a program runs. The code above simply displays a snapshot of the current size of the stack.

The following code determines the largest size the stack has attained using a method known as “stack painting.” In my example below we “paint” the entire upper memory where the stack resides with the value 0xFE (you could substitute any value here). The special attributes to the StackPaint() function ensure that the painting is accomplished before the stack is initialized. Then we execute a summation function which recursively calls it’s self. This is done in order to use a large portion of the stack, and is also a good demonstration as to why recursion is a bad choice for embedded programs. Finally the program checks and displays the amount of stack used.

Obviously, this method has a few flaws. For example, if the STACK_MARKER (0xFE) is placed on the stack, the StackUsed() function may find that value and incorrectly assume the stack termiated here.

#define STACK_MARKER 0xfe
extern uint8_t _end, __stack;
extern char *__bss_end;

void __attribute__ ((naked, section (".init1"))) StackPaint(void) {
  uint8_t *p = (uint8_t *)&__bss_end;
  
  while(p <= &__stack)
    *p++ = STACK_MARKER;
}

uint16_t StackUsed(void) {
	const uint8_t *p = (uint8_t *)RAMEND;
	uint16_t c = 0;
	
	while(*p != STACK_MARKER && p >= (uint8_t *)&__bss_end) {
		p--;
		c++;
	}
	return c;
}

extern "C" { 
  uint16_t Summation(uint16_t n) {
    asm volatile (
      "sbiw r24,0x00   \n" //test for n==0
      "breq 1f         \n" //n==0
      "add r30,r24     \n" //n = n +1 (word addition) 
      "adc r31,r25     \n"
      "sbiw r24,0x01   \n" //n = n - 1
      "call Summation  \n" //recurse
      "1: movw r24,r30 \n" //save n for return from subroutine
      "ret             \n"
      : :
    );
  /*
  //the above is the same as the following c code, however
  //optimization removes the recursion hence, we coded the 
  //function inline in assembly
  if (n > 0)
    n = n + Summation(n - 1);
  return (n);
  */
  }
}

void setup(void) {
  uint16_t c;
	
  Serial.begin(19200);
  asm(
    "push r30 \n"    //save r30/31 on stack
    "push r31 \n"
    "ldi r30,0x00\n" //zeroize r30/31 for use in recursion
    "ldi r31,0x00\n"
  );
  c = Summation(40);
  asm(
    "pop r31 \n" //restore r30/31
    "pop r30 \n"
  );
  Serial.print("Summation: "); Serial.println(c);
  c = StackUsed();
  Serial.print("Stack Used: "); Serial.println(c);
}

void loop(void) { }

For a demonstration of deliberate and devious stack manipulation see this post.

I am indebted to this avrfreak post for a portion of the code used here.

Advertisements

About Jim Eli

µC experimenter
This entry was posted in Uncategorized and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s