ATMEL Studio 7 Does Blink in Assembly Language

as7b

See the previous post (here) for detailed information on AS7 installation and simulating of an Arduino program execution. As an exercise to gain familiarity with AS7, lets make an assembly language project using the below Blink code:

• Select “File>New>Project”.

• Assembler.

• AVR Assembler Project.

• Give it a unique name.

• Select the proper device.

Remember to select the Simulator in order to run the program inside the AS7 IDE. See the previous post on how to do this. Here is a screen shot of the assembly code running in AS7:

as7-a

Here is the code for a very basic assembly language blink program:

.include "m168pdef.inc"

.org 0x0000
   rjmp start
start:
    ldi r16, 0           ; reset system status
    out SREG, r16        ; init stack pointer
    ldi r16, low(RAMEND)
    out SPL, r16
    ldi r16, high(RAMEND)
    out SPH, r16

    sbi DDRB, DDB5       ;pinMode(13, OUTPUT);
_loop:
    sbi PORTB, PORTB5    ;turn LED on
    rcall _delay
    cbi PORTB, PORTB5    ;turn LED off
    rcall _delay
    rjmp _loop

_delay:
    ldi r24, 0x00        ;one second delay iteration
    ldi r23, 0xd4 
    ldi r22, 0x30 
_d1:                     ;delay ~1 second
    subi r24, 1   
    sbci r23, 0   
    sbci r22, 0
    brcc _d1
    ret

Now, start learning inline assembly language:

Also available as a book, with greatly expanded coverage!

BookCover
[click on the image]

Posted in Uncategorized | Tagged , , , , , , , | Leave a comment

Using ATMEL Studio 7 for Arduino Development

as7a

While installing ATMEL Studio 7 (AS7) is not required in order to learn inline assembly language programming, it has worthwhile advantages. The ability to compile code for the Arduino, run it inside the included Simulator and immediately debug it will greatly speed your learning process. This seamless and iterative process would take several long minutes to complete if using the Arduino IDE. In fact, the Arduino IDE alone cannot perform the disassembly and debug functions.

AS7 now features seamless, one-click importation of projects created in the Arduino development environment. Your sketch, including any libraries it references, can be imported into AS7 as a C++ project. Once imported, you can leverage many additional capabilities of AS7 to fine-tune and debug your design. For most Arduino boards, shield-adapters that expose debug connectors are available, or one could use an ATMEL-ICE debug wire interface with a standard Arduino (slight modification of the Arduino is required).

More information on DebugWire with the Arduino can be found here:
Debugging Arduino using debugWire
Debugging with the new ATMEL-ICE
Modify an Arduino for DebugWire

Best of all, AS7 is free of charge. It is also important to note, that most of the functionality of AS7 is also available in the older Atmel Studio 6 IDE (AS6).

Install ATMEL Studio 7

I’m not going to go through the whole installation process (get the book), just navigate to the ATMEL Studio 7 website download page, and click the DOWNLOAD NOW link:

http://www.atmel.com/Microsite/atmel-studio/

Direct link to the download page:

http://www.atmel.com/tools/atmelstudio.aspx#download

Select the web installer unless you have a specific requirement to install off-line.

ATMEL Studio 7 Blinks

Congratulations if you have completed the installation. Let’s run the Studio for the first time.

On initial start, AS7 will ask you to select a user interface profile. I suggest the “Advanced” version, however either option is satisfactory. If desired, the profile can always be changed later.

as7-1

Be patient, it will seem like a long period of time for the program to load, but eventually you will be greeted with the IDE and the startup page populated with an ATMEL announcement internet feed. It is possible to disable this if it is unwanted (checkbox on the bottom left).

as7-2

From the file menu select NEW->Project.

as7-3

The New Project dialog will open and make the following selections:

• Insure “Installed” and “C/C++” is highlighted on the far left.
• Highlight “Create project from Arduino sketch”.
• Give the project an appropriate name like, “AS7_Blink” (note, spaces are not allowed in the project name).
• The default location should fill automatically.
• The solution name defaults to the project name, and this is acceptable for now.

as7-4

The next dialog will ask for the existing Arduino sketch location. Click on the button containing the ellipsis to the right of the Sketch File. Navigate to the Arduino Blink example sketch, which should be located inside your Arduino installation folder (similar location to mine on Windows):

C:\Program Files (x86)\Arduino\examples\01.Basics\Blink\Blink.ino

Continuing along:

• Insure your Arduino IDE path is properly filled in.
• Make the appropriate selections for your board (Arduino/Genuino Uno) and device (atmega328p).
• Click “Ok”.

AS7 will take some time creating the folder(s) for the project and pulling in the Blink sketch with all of its dependencies (all core and variant include files, Arduino core source files and libraries). It will eventually finish by creating and opening an editable “sketch.cpp” file inside the IDE.

as7-5

The sketch.cpp file is primarily the code from the example Arduino blink program including some additional automatically generated code by the ATMEL studio. Your solution should look like this:

as7-6

Ready, Set, Simulate!

We can now run the Blink program without an actual Arduino board using the simulator included with AS7. But first we need to inform AS7 to use the Simulator. Under the Project menu, select “Blink Properties…”.

as7-7

On the properties screen that appears, on the far left, select the “Tool” item, and under the heading of “Selected debugger/programmer”, select “Simulator” from the drop down list.

as7-8

Now, in order to run the program in the simulator, simply press “Alt+F5” or select “Debug->Start Debugging and Break” using the menu. AS7 will build the Blink project, execute the C-runtime startup code and then halt at the first line of the Arduino program. If you examine your screen, you’ll notice the debugger (on the far left) is pointing to and highlighting the line of code it has paused on:

init();

as7-9

At this point, you might think to yourself, this code wasn’t in the Blink program. What gives?

All of this is actually code that comes from the core Arduino wiring files which silently gets inserted into every Arduino sketch as needed. If you look further down the screen you will eventually see the call to the Blink Setup() function.

Hitting the “F10” key (Step Over) twice should bring the simulator to the function call to Setup. At this time, pressing the “F11” key (Step Into) will cause the simulator to jump into the Blink program and pause it’s execution on the first line inside the Setup function:

pinMode(led, OUTPUT);

Now things should start to look familiar. The last feature I want to demonstrate is the ability to drill down into the underlying assembly language that was created by the compiler. We do this by selecting an option from the Debug menu, “Debug>Windows>Disassembly”.

as7-10

You should now see a mixture of C and assembly code which makes up the executable Blink program. The simulator should be paused on the following line:

0000007B LDI R22,0x01 Load immediate

as7-11

I can not overstate the importance of this feature. The ability to examine the assembly code, compare it to the source, and to step through it line-by-line is an enormous asset to the inline assembly programmer. You can, among other features:

• Set breakpoints
• Watch variables
• Alter register values
• Time sections of code.

You will want to remember how to do this.

ATMEL has uploaded several informative videos on youtube.com demonstrating the use of the Studio, how to debug and how to use the simulator. A good example is located here:

Also available as a book, with greatly expanded coverage!

BookCover
[click on the image]

Here’s a partial look at the disassembly listing (mixed C and assembly) produced by AS7:

void setup() {                
  // initialize the digital pin as an output.
  pinMode(led, OUTPUT);     
0000007B  LDI R22,0x01		Load immediate 
0000007C  LDS R24,0x0100		Load direct from data space 
0000007E  JMP 0x000001A5		Jump 
  pinMode(led, OUTPUT);     
}

// the loop routine runs over and over again forever:
void loop() {
00000080  PUSH R28		Push register on stack 
00000081  PUSH R29		Push register on stack 
  digitalWrite(led, HIGH);   // turn the LED on (HIGH is the voltage level)
00000082  LDI R28,0x00		Load immediate 
00000083  LDI R29,0x01		Load immediate 
00000084  LDI R22,0x01		Load immediate 
00000085  LDD R24,Y+0		Load indirect with displacement 
00000086  CALL 0x000001E1		Call subroutine 
  delay(5000);               // wait for a second
00000088  LDI R22,0x88		Load immediate 
00000089  LDI R23,0x13		Load immediate 
0000008A  LDI R24,0x00		Load immediate 
0000008B  LDI R25,0x00		Load immediate 
0000008C  CALL 0x00000119		Call subroutine 
  digitalWrite(led, LOW);    // turn the LED off by making the voltage LOW
0000008E  LDI R22,0x00		Load immediate 
0000008F  LDD R24,Y+0		Load indirect with displacement 
00000090  CALL 0x000001E1		Call subroutine 
  delay(1000);               // wait for a second
00000092  LDI R22,0xE8		Load immediate 
00000093  LDI R23,0x03		Load immediate 
00000094  LDI R24,0x00		Load immediate 
00000095  LDI R25,0x00		Load immediate 
}
00000096  POP R29		Pop register from stack 
00000097  POP R28		Pop register from stack 
  delay(1000);               // wait for a second
00000098  JMP 0x00000119		Jump 
Posted in Uncategorized | Tagged , , , , , , , | Leave a comment

My Cup Overflows

overflow

When performing math (even basic addition and subtraction) with signed numbers an overflow problem sometimes arises. The Arduino microcontroller indicates the existence of an overflow error by setting the overflow flag in the SREG. Here’s a demonstration of the overflow problem with a simple addition operation:

volatile int8_t n1=0x70; //112
volatile int8_t n2=0x35; //53
volatile int8_t answer;

void setup() {
  Serial.begin(9600);
  
  asm(
    "add %1, %2 \n"
  
    : "=r" (answer) : "r" (n1), "r" (n2)
  );
  
  Serial.print("answer = "); Serial.println(answer);
}

The result to the above addition is, “answer = -91”, or 0xA5 hexadecimal. That’s wrong! The reason the answer turns out wrong is because the result is larger than an 8-bit register can hold.

The largest “signed 8-bit number” is +127, or 0x7f hexadecimal. However, this operation did set the Status Register Overflow Flag (V flag) to warn us that the result is erroneous. But, it’s completely up to us, the programmer to deal with this issue.

What’s Your Sign?

In “8-bit signed number” operations, the overflow flag is set when either of the following two conditions occur:

• There is a carry from bit 6 to bit 7, but no carry out of bit 7 (C flag not set).
• There is a carry out of bit 7 (C flag set), but no carry from bit 6 to bit 7.

I bring these two cases to your attention, because we can perform addition on two negative numbers with the sign bit remaining correct, yet the addition could still overflow. For example, when adding -2 (0x80) and -128 (0xFE), the result becomes 0x7E (+126), which again is incorrect.

When adding two numbers with different signs, the absolute value of the result is a smaller number than the absolute value of the operands prior to the addition. In this case, an overflow is impossible.

Therefore, an overflow is only possible when adding two numbers with the same sign. Furthermore, when adding two “same-signed numbers”, the sign of the result must be the same. The conclusion here is, for signed number addition, if the overflow flag is set, the result is invalid, and in unsigned addition, if the carry flag is set, the result is invalid. In signed number operations, overflow is possible, and overflow corrupts the result and negates the sign bit.

See my tutorial on Arduino Inline Assembly Math here.

Posted in Uncategorized | Tagged , , , , , | Leave a comment

Arduino Inline Assembly Port & Pin Compendium

compendium

The following is a compendium of inline assembly functions dealing with ports and pins. Use these at your own risk. These functions have been trimmed of most bounds checking, so they can easily be abused. The Arduino Inline Assembly Tutorial explains most of the details starting here.

analogWrite

This inline code writes an analog value (in the form of a PWM wave) to a particular pin. After executing, the pin will generate a steady square wave of the specified duty cycle until the next call (or call to digitalRead() or digitalWrite() on the same pin). The frequency of the PWM signal on most pins is approximately 490 Hz. On the Uno and similar boards, pins 5 and 6 have a frequency of approximately 980 Hz. On Arduino boards with the ATmega168/328, this function works on pins 3, 5, 6, 9, 10, and 11. The analogWrite function has nothing to do with the analog pins or the analogRead function.

A pinMode() call is included inside this function, so there is no need to set the pin as an output before executing this code.

This version of AnalogWrite, with no frills saves ~542 bytes over the built-in function:

//analogWrite requires a PWM pin 
//PWM pin/timer table:
//3:  (TIMER2B) PD3/TCCR2A/COM2B1/OCR2B
//5:  (TIMER0B) PD5/TCCR0A/COM0B1/OCR0B
//6:  (TIMER0A) PD6/TCCR0A/COM0A1/OCR0A
//9:  (TIMER1A) PB1/TCCR1A/COM1A1/OCR1A
//10: (TIMER1B) PB2/TCCR1A/COM1B1/OCR1B
//11: (TIMER2A) PB3/TCCR2A/COM2A1/OCR2A
//set below 6 defines per above table
#define ANALOG_PORT         PORTB
#define ANALOG_PIN          PORTB3
#define ANALOG_DDR          DDRB
#define TIMER_REG           TCCR2A
#define COMPARE_OUTPUT_MODE COM2A1
#define COMPARE_OUTPUT_REG  OCR2A

volatile uint8_t val = 128; //0-255

  asm (
    "sbi  %0, %1   \n" //DDR set to output (pinMode)

    "cpi  %6, 0    \n" //if full low (0)
    "breq _SetLow  \n"
    "cpi  %6, 0xff \n" //if full high (0xff)
    "brne _SetPWM  \n"

    "sbi  %2, %1   \n" //set high
    "rjmp _SkipPWM \n"

  "_SetLow:        \n"
    "cbi  %2, %1   \n" //set low
    "rjmp _SkipPWM \n"

  "_SetPWM:        \n"
    "ld   r24, X   \n"
    "ori  r24, %3  \n"
    "st   X, r24   \n" //connect pwm pin timer# & channel
    "st   Z, %6    \n" //set pwm duty cycle (val)

  "_SkipPWM:       \n"
    : : "I" (_SFR_IO_ADDR(ANALOG_DDR)), "I" (ANALOG_PIN),
    "I" (_SFR_IO_ADDR(ANALOG_PORT)), "M" (_BV(COMPARE_OUTPUT_MODE)),
    "x" (_SFR_MEM_ADDR(TIMER_REG)), "z" (_SFR_MEM_ADDR(COMPARE_OUTPUT_REG)), "r" (val)
    : "r24"
  );

analogRead

The Arduino board contains a 6 channel, 10-bit analog to digital converter which is the brains beneath the analogRead function. It maps input voltages between 0 and 5 into integer values between 0 and 1023, thus yielding a resolution between readings of: 5/1024 units or, 0.0049 volts (4.9 mV) per unit. The input range and resolution can be changed through the ANALOG_V_REF define. This code reads the value from the specified analog channel (0-7), which correspond to the analog pins (note, do NOT use A0-A7 for the channel number in this code). Further information about the underlying ADC can be found here.

While this version of analogRead (aRead) saves a few bytes (~50), it also gives the option of changing the speed via the ADC prescaler. However, don’t arbitrarily change the prescale without understanding the consequences. ATMEL advises the slowest prescale should be used (PS128). A higher speed (smaller prescale) reduces the accuracy of the AD conversion. The arduino sets the prescale to 128 during initiation, just as the code below does.

//Define various ADC prescales
#define PS2   (1<<ADPS0)                             //8000kHz ADC clock freq
#define PS4   (1<<ADPS1)                             //4000kHz
#define PS8   ((1<<ADPS0) | (1<<ADPS1))              //2000kHz
#define PS16  (1<<ADPS2)                             //1000kHz
#define PS32  ((1<<ADPS2) | (1<<ADPS0))              //500kHz
#define PS64  ((1<<ADPS2) | (1<<ADPS1))              //250kHz
#define PS128 ((1<<ADPS2) | (1<<ADPS1) | (1<<ADPS0)) //125kHz
#define ANALOG_V_REF     DEFAULT //INTERNAL, EXTERNAL, or DEFAULT
#define ADC_PRESCALE     PS128   //PS16, PS32, PS64 or P128(default)

uint16_t aRead(uint8_t channel) {
  uint16_t result;
  
  asm (
    "andi %1, 0x07    \n" //force pin==0 thru 7
    "ori  %1, (%6<<6) \n" //(pin | ADC Vref)
    "sts  %2, %1      \n" //set ADMUX

    "lds  r18, %3             \n" //get ADCSRA
    "andi r18, 0xf8           \n" //clear prescale bits
    "ori  r18, ((1<<%5) | %7) \n" //(new prescale | ADSC)
    "sts  %3, r18             \n" //set ADCSRA

    "_loop:       \n" //loop until ADSC cleared
    "lds  r18, %3 \n"
    "sbrc r18, %5 \n"
    "rjmp _loop   \n"

    "lds  %A0, %4   \n" //result = ADCL 
    "lds  %B0, %4+1 \n" //ADCH

    : "=r" (result) : "r" (channel), "M" (_SFR_MEM_ADDR(ADMUX)),
    "M" (_SFR_MEM_ADDR(ADCSRA)), "M" (_SFR_MEM_ADDR(ADCL)),
    "I" (ADSC), "I" (ANALOG_V_REF), "M" (ADC_PRESCALE)
    : "r18"
  );
  
  return result;
}

pinMode(OUTPUT)

The arduino pinMode function configures pin behavior. The code presented from here on, has been previously explained inside the Arduino Inline Tutorial Series.

asm (
  "sbi %0, %1 \n" //1=OUTPUT
    : : "I" (_SFR_IO_ADDR(DDRB)), "I" (DDB5)
);

pinMode (INPUT PULLUP)

asm (
  "cbi %0, %2 \n"
  "sbi %1, %2 \n"
    : : "I" (_SFR_IO_ADDR(DDRB)), "I" (_SFR_IO_ADDR(PORTB)), "I" (DDB5)
);

pinMode (INPUT)

asm (
  "cbi %0, %2 \n"
  "cbi %1, %2 \n"
    : : "I" (_SFR_IO_ADDR(DDRB)), "I" (_SFR_IO_ADDR(PORTB)), "I" (DDB5)
);

pinMode with Multiple Pins

#define PIN_DIRECTION 0b00101000 //PIN 3 & 5 OUTPUT
//#define PIN_DIRECTION (1<<DDB3) | (1<<DDB5)
asm (
  "out %0, %1 \n"
  : : "I" (_SFR_IO_ADDR(DDRB)), "r" (PIN_DIRECTION)
);

digitalWrite HIGH

If a pin has been configured as an OUTPUT, its voltage will be set to the corresponding value: 5V (or 3.3V on 3.3V boards) for HIGH, 0V (ground) for LOW. However, if the pin is configured as an INPUT, digitalWrite enables (HIGH) or disables (LOW) the internal pullup on the input pin.

asm (
  "sbi %0, %1 \n"
  : : "I" (_SFR_IO_ADDR(PORTB)),"I" (PORTB5)
);

digitalWrite LOW

asm (
  "cbi %0, %1 \n"
  : : "I" (_SFR_IO_ADDR(PORTB)), "I" (PORTB5) 
);

digitalWrite(output)

volatile uint8_t output = HIGH; //LOW or HIGH
asm (
  "cpi %2, 0     \n"
  "breq 1f       \n"
  "sbi %0, %1    \n"
  "rjmp 2f       \n"
  "1: cbi %0, %1 \n"
  "2:            \n"
  : : "I" (_SFR_IO_ADDR(PORTB)), "I" (PORTB5), "r" (output)
);

digitalToggle

Try to find this one in the Arduino wiring code:

//toggle pin
asm (
  "in r24, %0  \n"
  "eor r24, %1 \n"
  "out %0, r24 \n"
  : : "I" (_SFR_IO_ADDR(PORTB)), "r" ((uint8_t)_BV(PORTB5)) : "r24"
);

digitalRead

digitalRead simply reads the value from a specified digital pin, either HIGH or LOW.

volatile uint8_t status;
 
asm (
  "in __tmp_reg__, __SREG__  \n"
  "cli                       \n"                     
  "ldi %0, 1                 \n" //high 
  "sbis %1, %2               \n" //skip next if pin high
  "clr %0                    \n" //low
  "out __SREG__, __tmp_reg__ \n"
  : "=r" (status) : "I" (_SFR_IO_ADDR(PINB)), "I" (PINB5)  
);

digitalRead Alternative

This is a generic alternative, which can be called programmatically. Note it must be called using a pointer to the PIN (&PINB), otherwise the compiler emits incorrect code:

//call like so:
//uint8_t status = dRead(&PINB, PINB5);

__attribute__ ((noinline)) uint8_t dRead(volatile uint8_t *port, uint8_t pin) {
  uint8_t result, mask=1;

  asm (
    "movw  r30, %1 \n" //port reg addr in Z
  "1:              \n"
    "cpi  %2, 0    \n" //loop until pin==0
    "breq 2f       \n" //leave loop
    "lsl  %3       \n" //shift (mask) left 1 position
    "dec  %2       \n" //decrement loop counter
    "rjmp 1b       \n" //repeat
  "2:              \n"
    "in   __tmp_reg__, __SREG__ \n" //preserve sreg
    "cli           \n" //disable interrupts
    "ld   r18, Z   \n" //fetch port data
    "and  r18, %3  \n" //compare pin with mask
    "ldi  %0, 1    \n" //set return high
    "brne 3f       \n" 
    "clr  %0       \n" //set return low
  "3:              \n"
    "out  __SREG__, __tmp_reg__ \n"
    : "=&r" (result) : "r" (port), "a" (pin), "r" (mask) : "r18", "r30", "r31"
  );

  return result;
}

Example of turning off PWM for arduino digital pin #11

//digital PWM pin registers:
//3:  (TIMER2B) PD3/TCCR2A/COM2B1/OCR2B
//5:  (TIMER0B) PD5/TCCR0A/COM0B1/OCR0B
//6:  (TIMER0A) PD6/TCCR0A/COM0A1/OCR0A
//9:  (TIMER1A) PB1/TCCR1A/COM1A1/OCR1A
//10: (TIMER1B) PB2/TCCR1A/COM1B1/OCR1B
//11: (TIMER2A) PB3/TCCR2A/COM2A1/OCR2A

asm (
  "ld  r16, Z \n"
  "ldi r17, 0xff \n"
  "eor r17, %1 \n"
  "and r16, r17 \n"
  "st  Z, r16 \n"
  : : "z" (_SFR_MEM_ADDR(TCCR2A)), "d" (COM2A1) : "r16", "r17"
);

Also available as a book, with greatly expanded coverage!

BookCover
[click on the image]

Posted in Uncategorized | Tagged , , , , , , , | 1 Comment

Arduino Inline Assembly Tutorial (Examples)

case study

As the final tutorial in this series, we present four example inline assembly functions for the arduino. Specifically, these cover the conversion of a byte to a hexadecimal string, SPI Mode 0 hardware transfer, SPI Mode 0 Bit-banging, and the C library atoi function. Do not take these functions as archetypical examples of high-quality coding practice or brilliantly efficient inline code. They are neither.

Most of the previous examples in this series were simple “snippets of code”, and as such gave a myopic view of inline assembly. The goal here is to show complete and working demonstrations of how to include inline assembly into the typical arduino program. Each example includes explanatory comments covering the key portions of code.

In addition to these examples, have a look at the Arduino Inline Assembly Blink Program.

Stringing Hexadecimals

The following code converts a byte value into a hexadecimal string. Notice at the start of the code, that the constraint #0 value (val) is temporarily saved in the r25 register. The function then converts the first nibble. When the conversion process is complete, the function loops back and converts the second nibble. Note how the code uses the SREG T-bit to flag the first vs. second nibble.

void ByteToHexStr(uint8_t val, char *str) {
  asm (
    "set           \n" //flag first nibble
    "mov r25, %0   \n" //save val
    "swap %0       \n" //swap for correct nibble order
  "1:              \n"
    "andi %0, 0xf  \n" //mask a nibble
    "cpi  %0, 0xa  \n" //>10?
    "brcc 2f       \n" //yes
    "subi %0, 0xd0 \n" //convert numeral (0-9) 
    "rjmp 3f       \n" //skip next
  "2:              \n"
    "subi %0, 0xc9 \n" //convert letter (A-F)
  "3:              \n"
    "st Z+, %0     \n" //put into string
    "brtc 4f       \n" //upper nibble?
    "clt           \n" //clear nibble flag
    "mov %0, r25   \n" //get upper nibble
    "rjmp 1b       \n" //repeat conversion
  "4:              \n" //exit
    : : "r" (val), "z" (str) : "memory"
  );
}

I SPI With My Little Eye…

Serial Peripheral Interface (SPI) is a synchronous serial data protocol used by microcontrollers for communicating with one or more peripheral devices, or for communication between two microcontrollers. The SPI standard is loose and each device implements it a little differently, which means you must pay close attention to the device’s datasheet when implementing the protocol. Generally speaking, there are four modes of transmission, defined by the clock phase and polarity.

Here are two versions of the SPI transfer function. The first of these programs incorporates the arduino hardware SPI. The second is a bit-bang version using different pins. More information on SPI can be found here and here.

SPI Mode 0 Hardware Transfer

static __attribute__ ((noinline)) uint8_t SpiXfer(uint8_t data) {
  asm (
    "out  %1, %0          \n" //put data out SPDR register
    "nop                  \n" //pause
  "1:                     \n"
    “in   __tmp_reg__, %2 \n" //check xmit complete
    "sbrs __tmp_reg__, %3 \n"
    "rjmp 1b              \n"
    "in   %0, %1          \n" //get incoming data
    : "+r" (data) : "M" (_SFR_IO_ADDR(SPDR)),
    "M" (_SFR_IO_ADDR(SPSR)), "I" (SPIF)
  );

  return data;
}

SPI Bit-Bang

#define MOSI_PORT  PORTD
#define MOSI_BIT   PORTD5
#define MISO_PORT  PIND
#define MISO_BIT   PIND6
#define CLOCK_PORT PORTD
#define CLOCK_BIT  PORTD7

static __attribute__ ((noinline)) uint8_t SpiBitBang(uint8_t data) {
  register uint8_t tmp, i=8;
  
  //save and restore sreg because t-bit is utilized
  asm (
    "in __tmp_reg__, __SREG__ \n"
  "1:               \n"
    "sbrs %0, 0x07  \n" //is output data bit high?
    "rjmp 2f        \n" //no
    "sbi  %3, %4    \n" //output a high bit
    "rjmp 3f        \n"
  "2:               \n"
    "cbi  %3, %4    \n" //output a low bit
  "3:               \n"
    "lsl  %0        \n" //shift to next bit
    "in   %1, %5    \n" //get input
    "tst  %1        \n" //anything here?
    "breq 4f        \n" //nope
    "bst  %1, %6    \n" //set t-bit if input bit is high
    "clr  %1        \n" //zeroize register
    "bld  %1, 0     \n" //set bit 0
    "or   %0, %1    \n" //or low bit with data for return value
  "4:               \n"
    "sbi  %7, %8    \n" //toggle clock bit high
    "nop            \n" //pause
    "cbi  %7, %8    \n" //toggle clock bit low
    "subi %2, 1     \n" //more bits?
    "brne 1b        \n" //do next bit
    "out __SREG__, __tmp_reg__ \n"
    : "+r" (data), "=&r" (tmp): "a" (i),
    "M" (_SFR_IO_ADDR(MOSI_PORT)), "I" (MOSI_BIT),
    "M" (_SFR_IO_ADDR(MISO_PORT)), "I" (MISO_BIT),
    "M" (_SFR_IO_ADDR(CLOCK_PORT)),  "I" (CLOCK_BIT)
  );

  return data;
}

A Toy

Atoi is a function in the that converts a string into an integer numerical representation (atoi stands for ASCII to integer). It is included in the C standard library header file stdlib.h. It is prototyped as follows:

int atoi(const char *str);

The str argument is a string, represented by an array of characters, containing the characters of a signed integer number. The string must be null-terminated.

Here is the basic idea of the atoi function implemented in C language:

int16_t atoi(char s[]) {
  uint8_t i, sign;
  int16_t n;
  
  //skip white space
  for (i=0; s[i]<=' '; i++);
  
  //sign
  sign = 0;
  if (s[i] == '-') {
    sign = 1;
    i++;
  }
  
  //convert
  for (n=0; s[i]>='0' && s[i]<='9'; i++)
    n = 10*n + s[i] - '0';
  
  if (sign)
    return (-1*n);
  else
    return n;
}

Atoi Inline

Here is our implementation, which is only 64 bytes in length. By comparison, the arduino AVR libc atoi() function is 76 bytes long. This version is basically functionally equivalent, however there are a few detail differences (this function steps over all leading ASCII characters 0x2F and below, not just whitespace):

int16_t _atoi(const char *s) {
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wuninitialized"
  //sign & c are initialized inside inline asm code
  register uint8_t sign, c;
#pragma GCC diagnostic pop
  //force result into return registers
  register int16_t result asm("r24"); 
  
  asm (
    "ldi  %A0, 0x00         \n" //result = 0
    "ldi  %B0, 0x00         \n"

  "1:                       \n"
    "ld   %2, Z+            \n" //fetch char
    "cpi  %2, '-'           \n" //negative sign?
    "brne 2f                \n"
    "ldi  %3, 0x01          \n" //sign = TRUE

  "2:                       \n"
    "cpi  %2, '/' + 1       \n" //step over whitespace/garbage
    "brcc 3f                \n"
    "rjmp 1b                \n"

  "3:                       \n"
    "rjmp 5f                \n"

  "4:                       \n"
    "ldi  r23, 10           \n" //result *= 10
    "mul  %B0, r23          \n"
    "mov  %B0, r0           \n"
    "mul  %A0, r23          \n"
    "mov  %A0, r0           \n"
    "add  %B0, r1           \n"
    "clr  __zero_reg__      \n" //r1 trashed by mul
    "add  %A0, %2           \n" //result += new digit
    "adc  %B0, __zero_reg__ \n"
    "ld   %2, Z+            \n" //fetch next digit char
  
  "5:                       \n"
    "subi %2, '0'           \n" //convert char to 0-9
    "cpi  %2, 10            \n" //end of string?
    "brlo 4b                \n"

    "cpi  %3, 0             \n" //negative?
    "breq 6f                \n"
    "com  %B0               \n" //negate result
    "neg  %A0               \n"
    "sbci %B0, -1           \n"
  
  "6:                       \n"
    : "+r" (result) : "z" (s), "a" (c), "a" (sign) : "memory"
  );

  return result;
}

Conclusion

While there are countless more topics to cover, and many more rabbit-holes to dive down, I believe I have covered enough of the basics in this series. I sure enjoyed researching and writing these tutorials. And, hopefully you gained a few insights into the funky world of arduino (AVR) inline assembly programming. Now, get inline with your programming!

[updated: 4.11.16]

Also available as a book, with greatly expanded coverage!

BookCover
[click on the image]

Posted in Uncategorized | Tagged , , , , , | Leave a comment

Arduino Inline Assembly Tutorial (Interrupts)

interruption

Pardon The Interruption

The previous tutorial covered the basics of writing inline functions. A close relative of the function is the Interrupt Service Routine (ISR), which is the topic here. Portions of this tutorial may pertain to functions as well.

As a warning, this tutorial assumes an understanding of the basic concepts of interrupts in general, and specifically interrupt handlers on the arduino (AVR μC). Hopefully, you have already written a few arduino interrupts in C, using the internal arduino functionality. If not, you may want to study some of the links given in the reference section of this tutorial before continuing.

The Deck is Stacked

Basic knowledge of the stack is essential to understanding functions and interrupt handlers. The basic purpose of the stack is to support function calls and interrupts. Whenever a program makes a function call or whenever an interrupt occurs, the stack is used to store critical information which will be restored upon completion of the function or interrupt. Additional information on the stack can be found here and here.

First and primary, during a function call or interrupt, the hardware places the return address on the stack. The saving and restoration of the return address is accomplished transparently by the CALL and RET instructions. It is not necessary to perform any special instruction(s) to make this occur.

Second, if any “call-saved” registers will be “clobbered” inside the function, these registers are “pushed” onto the stack. In the case of an interrupt service routine, all of the registers used inside the ISR (and always the temporary and zero registers, r0 and r1) get pushed onto the stack. Additionally, during an ISR the SREG is saved and restored.

Finally, if the compiler deems it necessary, space is reserved for any local variables on the stack. Many times the compiler will place local variables into specific registers, and therefore doesn’t use the stack for temporary storage.

Here is an example of how the compiler uses the stack to store local variables inside of a function. This is sometimes referred to as “setting up a stack frame.” We will reserve 16 bytes for a character array (note: unrelated code has been removed for the purpose of clarity). The compiler performs all of this stack manipulation for us behind the scenes, so-to-speak:

void example(void) {
  char buffer[16]; //space will be reserved on the stack
 
  //
  //do something here. . .
  //
 
}

Result in this machine code:

;prologue
  PUSH r28          ;save registers on stack 
  PUSH r29 
  IN   r28, SPL     ;get stack pointer    
  IN   r29, SPH   
  SBIW r28, 16      ;reserve 16 bytes space on stack
                    ;the stack grows downward, hence the subtraction
  OUT  SPH, r29     ;update new stack pointer
  OUT  SPL, r28 
 
;
;do something here. . .
;
 
;epilogue
  ADIW r28, 16      ;remove the 16 bytes from the stack
  OUT  SPH, r29     ;restore stack pointer
  OUT  SPL, r28 
  POP  r29          ;restore registers from stack
  POP  r28 
  RET 
}

Upon return from the interrupt or function, all the preserved values are restored, or “popped” from the stack. Obviously, during the pro and epilogue code, the order of the push and pop instructions is very critical.

Interrupt Before and After

Below, I wrote a very basic interrupt routine that simply increments a byte so we can examine the prologue and epilogue code generated by the compiler:

//here is an example ISR coded in C:
volatile uint8_t a;
 
ISR(INT0_vect) {
  a++;
}
 
//this is the generated assembly code:
;prologue
0000027F 1f.92                PUSH r1       ;save r1 register
00000280 0f.92                PUSH r0       ;save r0 register
00000281 0f.b6                IN r0, SREG   ;get status register
00000282 0f.92                PUSH r0       ;save sreg 
00000283 11.24                CLR r1        
00000284 8f.93                PUSH r24      ;save r24 register
;increment byte (a) here
00000285 80.91.c3.01          LDS r24, (a) 
00000287 8f.5f                SUBI r24, 0xFF     
00000288 80.93.c3.01          STS (a), r24 
;epilogue
0000028A 8f.91                POP r24       ;restore r24 register
0000028B 0f.90                POP r0        ;restore status register
0000028C 0f.be                OUT SREG, r0
0000028D 0f.90                POP r0        ;restore r0 register
0000028E 1f.90                POP r1        ;restore r1 register
0000028F 18.95                RETI          ;return from interrupt

As you can see, the meat of the ISR is only 10 bytes long. However, together the prologue and epilogue add another 24 bytes, for a total of 34. It might be possible to save a few bytes and program cycles by tightly writing your own ISR pro and epilogue. GCC has a provision which allows writing your own pro and epilogues, which will be covered later.

We Interrupt This Program to Blink

It is now time to write an interrupt handler, or ISR in inline assembler. I can’t think of a better example than to adapt the basic Blink sketch to use the Timer #1 Overflow interrupt. Please note, because this code alters the Timer #1 registers, it will render any use of the arduino Timer #1 as nonfunctional (i.e. analogWrite pins 9 & 10, the Servo Library, etc.).

Handle It

The first order of business is to write the interrupt handler for the Timer #1 Overflow. This is the routine that is called when the Timer #1 counter (TCNT1) rolls over from 0xffff to zero. Our the ISR is very basic, and as always, it should be kept as short as possible. Inside the handler we perform two functions:

  • Reset the counter (TCNT1) allowing the next overflow to reoccur at 1 second intervals.
  • Toggle the LED.

An ISR can be coded using inline assembler just as in a “C Stub Function”, relying upon the compiler to insert the necessary prologue and epilogue code. I suggest you use this stub technique at first before graduating to writing the entire “naked” ISR. Here is a stub version of our ISR:

#define TCNT_BASE   0x0bdc
#define TCNT_BASE_H (((TCNT_BASE)>>8)&0xff)
#define TCNT_BASE_L ((TCNT_BASE)&0xff)

ISR(TIMER1_OVF_vect) {
  asm (
    //reload TCNT1 counter for 1sec interrupt
    "ldi r24, %3           \n"
    "st  Z+, r24           \n" //TCNT1L
    "ldi r24, %4           \n"
    "st  Z, r24            \n" //TCNT1H
    //toggle LED
    "in   __tmp_reg__, %0  \n" //read port
    "ldi  r24, %1          \n" //LED bit mask
    "eor  __tmp_reg__, r24 \n" //toggle LED bit
    "out  %0, __tmp_reg__  \n" //write port
    : : "I" (_SFR_IO_ADDR(PORTB)), "I" (_BV(PORTB5)),
    "z" (_SFR_MEM_ADDR(TCNT1)), "M" (TCNT_BASE_L), "M" (TCNT_BASE_H) : "r24"
  );
}

Having said all that, the boilerplate code the compiler inserts is not always the most efficient, and many times inadequate. For these reasons, and for the academic exercise, we will also select the “ISR_NAKED” attribute when defining the ISR. This gives us full control over all of the code inside the ISR. Full control is a good thing:

ISR(TIMER1_OVF_vect, ISR_NAKED)

Eleven instructions encompass the prologue and epilogue, which is more than the code required for the main purpose of the interrupt. Notice inside the handler, we utilize 3 registers, r24, r30 and r31. This means we need to preserve the content of these registers since the interrupt could be triggered at any time, even precisely when these registers may be in use. Additionally we need to preserve the status register (SREG). The SREG holds critical information on the state of the program when the interrupt fired. Neglecting to reserve any of this information would probably cause the program to crash.

Don’t forget to include the terminating RETI instruction also. By comparison, this ISR_NAKED version is 10 bytes shorter than the “Stub” version:

#include "k328p.h"

#define TCNT_BASE   0x0bdc
#define TCNT_BASE_H (((TCNT_BASE)>>8)&0xff)
#define TCNT_BASE_L ((TCNT_BASE)&0xff)

ISR(TIMER1_OVF_vect, ISR_NAKED) {
  asm (
    "push r31           \n" //save r30, r31 contents
    "push r30           \n"
    "push r24           \n"
    //preserve SREG
    "in   r24, __SREG__ \n"
    "push r24           \n"

    //reload TCNT1 counter for 1sec interrupt
    "clr r31            \n"
    "ldi r30, %2        \n"
    "ldi r24, %3        \n"
    "st  Z+, r24        \n" //TCNT1L
    "ldi r24, %4        \n"
    "st  Z, r24         \n" //TCNT1H
    //toggle LED
    "in   r30, %0       \n" //read port
    "ldi  r31, %1       \n" //LED bit mask
    "eor  r30, r31      \n" //toggle LED bit
    "out  %0, r30       \n" //write port

    //restore old SREG
    "pop  r24           \n"
    "out  __SREG__, r24 \n"
    //restore r30, r31
    "pop r24            \n"
    "pop  r30           \n"
    "pop  r31           \n"
    "reti               \n"
    : : "I" (kPORTB), "I" (_BV(PORTB5)), 
    "M" (kTCNT1), "M" (TCNT_BASE_L), "M" (TCNT_BASE_H)
  );
}

The initiation code required for the Timer #1 interrupt (setting the prescaler, loading the counter and enabling the overflow interrupt) is completely contained inside the Setup function. Obviously, it is not necessary to write this in inline assembly, it’s just good practice:

#include "k328p.h"

#define TCNT_BASE   0x0bdc
#define TCNT_BASE_H (((TCNT_BASE)>>8)&0xff)
#define TCNT_BASE_L ((TCNT_BASE)&0xff)

void setup() {
  uint16_t TNCTBase = TCNT_BASE;

  asm (
    "cli                  \n" //disable gloal interrupts 
    "sbi %0, %1           \n" //pinMode(13, OUTPUT);

    //set 256 prescale (CS12)
    "st  Z+, __zero_reg__ \n" //TCCR1A
    "ldi r24, %3          \n"
    "st  Z+, r24          \n" //zero TCCR1B
    "st  Z, __zero_reg__  \n" //zero TCCR1C
    //load counter for 1sec interrupt
    "ldi r30, %4          \n"
    "st  Z+, %A5          \n" //TCNT1L
    "st  Z, %B5           \n" //TCNT1H
    //enable overflow interrupt
    "ldi r30, %6          \n"
    "ldi r24, %7          \n"
    "st  Z, r24           \n" //TIMSK1

    "sei                  \n" //enable global interrupts 
    : : "I" (_SFR_IO_ADDR(DDRB)), "I" (PORTB5),
    "z" (_SFR_MEM_ADDR(TCCR1A)), "I" (_BV(CS12)),
    "M" (kTCNT1), "r" (TNCTBase),
    "M" (kTIMSK1), "I" (_BV(TOIE1)) : "r24", "memory"
  );
}

void loop() { }

Finally, we are introducing a new header file “k328p.h” (contents listed below) which contains all of the IO register defines in such a way that we can use them inside our inline assembly routines. The definitions in this file use the same standard ATMEL mnemonics for the IO registers with the letter ‘k’ pre-pended. They are the LSB of the IO register address, and allow greater flexibility in inline assembler code when referring to the IO registers (when using pointer registers with the LD/ST instructions). A close examination of the above code will reveal the method of use.

Arduino IO Register Defines

//k328p.h - definitions for ATmega328P
//4.4.2016
#ifndef _k328P_H_
#define _k328P_H_ 

//standard registers 
//0-0x1f: bit addressable
//0-0x3f: IN/OUT compatible 
//0-0x3f: add 0x20 when using LD/ST
#define kPINB   0x03
#define kDDRB   0x04
#define kPORTB  0x05
#define kPINC   0x06
#define kDDRC   0x07
#define kPORTC  0x08
#define kPIND   0x09
#define kDDRD   0x0A
#define kPORTD  0x0B

#define kTIFR0  0x15
#define kTIFR1  0x16
#define kTIFR2  0x17

#define kPCIFR  0x1B
#define kEIFR   0x1C
#define kEIMSK  0x1D
#define kGPIOR0 0x1E
#define kEECR   0x1F
//end bit addressable

#define kEEDR   0x20
#define kEEAR   0x21
#define kEEARL  0x21
#define kEEARH  0x22
#define kGTCCR  0x23
#define kTCCR0A 0x24
#define kTCCR0B 0x25
#define kTCNT0  0x26
#define kOCR0A  0x27
#define kOCR0B  0x28

#define kGPIOR1 0x2A
#define kGPIOR2 0x2B
#define kSPCR   0x2C
#define kSPSR   0x2D
#define kSPDR   0x2E

#define kACSR   0x30

#define kMCUSR  0x34
#define kMCUCR  0x35

#define kSPMCSR 0x37

#define kSPL    0x3D
#define kSPH    0x3E
#define kSREG   0x3F
//end IN/OUT compatible

//extended registers begin
#define kWDTCSR 0x60
#define kCLKPR  0x61

#define kPRR    0x64

#define kOSCCAL 0x66

#define kPCICR  0x68
#define kEICRA  0x69

#define kPCMSK0 0x6B
#define kPCMSK1 0x6C
#define kPCMSK2 0x6D
#define kTIMSK0 0x6E
#define kTIMSK1 0x6F
#define kTIMSK2 0x70

#define kADC    0x78
#define kADCW   0x78
#define kADCL   0x78
#define kADCH   0x79
#define kADCSRA 0x7A
#define kADCSRB 0x7B
#define kADMUX  0x7C

#define kDIDR0  0x7E
#define kDIDR1  0x7F

#define kTCCR1A 0x80
#define kTCCR1B 0x81
#define kTCCR1C 0x82

#define kTCNT1  0x84
#define kTCNT1L 0x84
#define kTCNT1H 0x85
#define kICR1   0x86
#define kICR1L  0x86
#define kICR1H  0x87
#define kOCR1A  0x88
#define kOCR1AL 0x88
#define kOCR1AH 0x89
#define kOCR1B  0x8A
#define kOCR1BL 0x8A
#define kOCR1BH 0x8B

#define kTCCR2A 0xB0
#define kTCCR2B 0xB1
#define kTCNT2  0xB2
#define kOCR2A  0xB3
#define kOCR2B  0xB4
#define kASSR   0xB6

#define kTWBR   0xB8
#define kTWSR   0xB9
#define kTWAR   0xBA
#define kTWDR   0xBB
#define kTWCR   0xBC
#define kTWAMR  0xBD

#define kUCSR0A 0xC0
#define kUCSR0B 0xC1
#define kUCSR0C 0xC2

#define kUBRR0  0xC4
#define kUBRR0L 0xC4
#define kUBRR0H 0xC5
#define kUDR0   0xC6
//end extended registers

//0-0x3f for LD/ST instructions
#define k2PINB   0x23
#define k2DDRB   0x24
#define k2PORTB  0x25
#define k2PINC   0x26
#define k2DDRC   0x27
#define k2PORTC  0x28
#define k2PIND   0x29
#define k2DDRD   0x2A
#define k2PORTD  0x2B
#define k2TIFR0  0x35
#define k2TIFR1  0x36
#define k2TIFR2  0x37
#define k2PCIFR  0x3B
#define k2EIFR   0x3C
#define k2EIMSK  0x3D
#define k2GPIOR0 0x3E
#define k2EECR   0x3F
#define k2EEDR   0x40
#define k2EEAR   0x41
#define k2EEARL  0x41
#define k2EEARH  0x42
#define k2GTCCR  0x43
#define k2TCCR0A 0x44
#define k2TCCR0B 0x45
#define k2TCNT0  0x46
#define k2OCR0A  0x47
#define k2OCR0B  0x48
#define k2GPIOR1 0x4A
#define k2GPIOR2 0x4B
#define k2SPCR   0x4C
#define k2SPSR   0x4D
#define k2SPDR   0x4E
#define k2ACSR   0x50
#define k2MCUSR  0x54
#define k2MCUCR  0x55
#define k2SPMCSR 0x57
#define k2SPL     0x5D
#define k2SPH     0x5E
#define k2SREG    0x5F

#endif //_k328P_H_

References

Arduino Interrupts
Newbie’s Guide to AVR Interrupts
PJRC Guide to Interrupts
AVR Libc Information on Interrupts
University of Maryland, BC, C Programming and Embedded Systems Course, Interrupt Information
AVR 8-bit Instruction Set
AVR-GCC Inline Assembler Cookbook
Extended Asm – Assembler Instructions with C Expression Operands
Mixing C and Assembly Language
ATMEL ATmega328P Datasheet

Also available as a book, with greatly expanded coverage!

BookCover
[click on the image]

Posted in Uncategorized | Tagged , , , , , | Leave a comment

Arduino Inline Assembly Tutorial (Functions)

func machine

At first consideration, the topic of functions seems simple and trite. Just discuss how to “CALL” and “RETURN” to and from a function, right? However, there are many subtopics involved as well. For example, passing and returning parameters, prologue and epilogue code, the stack frame and mixing assembly and C are topics deserving of separate tutorials. Hopefully, we can do all of these justice, but first, the basics…

Convert Snippet Into a Function

How about a simple demonstration of turning an inline code snippet into a function? In a previous tutorial on indirect addressing, several inline pieces of code were developed to perform various string operations. One such operation determined the character length of a string. The code is below.

String Length, Sounds Like strlen

const char src[4] = "abc";
volatile uint8_t len;
 
asm (
  "_loop:               \n"
  "ld   __tmp_reg__, Z+ \n"
  "tst  __tmp_reg__     \n"
  "brne _loop           \n"
  //Z points one character past the terminating NUL
  "subi %A1, 1          \n" //subtract post-increment
  "sbci %B1, 0          \n"
  "sub  %A1, %A2        \n" //length = end - start
  "sbc  %B1, %B2        \n"
  "mov  %0, %A1         \n" //save len (uint8_t)
  : "=r" (len) : "z" (src), "x" (src)
);

While this code could easily be included “inline”, it certainly would be more useful if it was defined as a general function. This would make it much easier to use throughout a program, and also reduce overall program size by incorporating only one instance of the code. So how is this accomplished?

Stub Your Code

The official Cookbook refers to this techniques as a “C Stub Function,” which is nothing more than a function definition containing only inline assembler code. Typically, in a “C Stub Function”, the function parameters and local variables define the data used in, and the value returned (if any) by the function. This is an easy method to pass data to/from the inline function, without the need to understand the underlying details of how its done. Therefore, we eliminate the necessity of writing additional supporting code.

The above “string length” snippet easily becomes a full blown function, _strlen() using this method. Notice the transformed function below receives a string, (s) as a parameter, and returns the length, which is defined as a local variable. We refer to these same variables in the input and output constraints:

inline uint8_t _strlen(const char *s) {
  uint8_t len;

  asm (
    "_loop:              \n"
    "ld  __tmp_reg__, Z+ \n"
    "tst __tmp_reg__     \n"
    "brne _loop          \n"
    //len=Z - 1 – src = (-1 - src) + Z = ~src + Z
    "com %A2             \n"
    "com %B2             \n"
    "add %A2, %A1        \n"
    "adc %B2, %B1        \n"
    : "=r" (len) : "z" (s), "x" (s)
  );

  return len;
}

Here is a look at the code generated by the above C-Stub Function (notice the compiler/assembler doesn’t need to generate a lot of “stub” code):

  MOVW r30, r24
  MOVW r26, r24
loop:
  LD r0,Z+
  TST r0
  BRNE loop
  COM r26
  COM r27
  ADD r26, r30
  ADC r27, r31

Placing a Call

An extension to the “C Stub Function” technique is calling another C function from inside inline assembly code. The following bit of code demonstrates the CALL instruction. This instruction “calls” a subroutine located within the program memory (if we remember to properly define the function to avoid linkage errors). The C Stub Function even handles the return (RET) for us.

An additional detail required here, is the need to encapsulate the “called” function inside the extern “C” { } declaration (see below example). The extern “C”, C++ keyword prevents the function name from becoming “mangled”, thus preventing the linker from locating the called function.

extern "C" {
  void foo() {
    // do something here...
  }
}

void test() {
  asm (
    "call foo \n"
  );
}

Playing Catch

Next, we present a basic example of passing and returning parameters to and from C Stub Functions. The purpose of the following code is to convert an upper case ASCII character into its lower case equivalent. We’ve created two functions here, _isupper and _tolower, which validate the input character and then perform the conversion.

Take a look at the code below.

Notice, the first thing _tolower does is call the function, _isupper. Since _tolower hasn’t done anything yet, the C Stub Function simply hands the input character (c), the parameter to _tolower directly onto the _isupper function. Neat!

Next, _isupper checks the character to confirm its actually an upper case character. If so, it returns the character, otherwise it returns a zero. Upon returning to _tolower, the next instruction which is executed is “tst r24”, a test of the contents of register r24. If register #24 (r24) is not zero, the character is converted and the function returns.

Again, notice the use of the C++ keyword “extern C {}” here:

extern "C" {
  unsigned char _isupper(unsigned char c) {
    //bind variable to a specific register r18
    register unsigned char ch asm("r18");
    
    asm (
      "mov  %1, %0 \n" //save input
      "subi %1, 'A'\n" //subtract 0x41
      "brmi 2f     \n" //branch if minus
      "subi %1, 26 \n" //26 letters
      "brpl 2f     \n" //branch if plus
      "ret         \n" //c==upper, return
      "2: clr  %0  \n" //false
      : "+r" (c) : "r" (ch) 
    );
    
    return c;
  }
}

char _tolower(unsigned char c) {
  asm (
    "call _isupper \n" //validate char
    "tst r24       \n" //0 = not alpha char
    "breq 1f       \n" //not alpha char
    "ori %0, 0x20  \n" //make lower
    "1:            \n"
    : "+r" (c)
  );
  
  return c;
}

Insider Information

Why did function _tolower choose to test register #24 (r24)? The above two functions relied on “insider” information when using register r24. These routines knew that an 8-bit, byte-sized value is passed to and from a function via the r24 register. The C Compiler always passes function arguments and returns values in specific register locations. Knowing these locations are essential to writing efficient inline assembly code, especially when interfacing with the C language.

This is a good time to review the data type sizes: a char is 8 bits, an int is 16 bits, a long is 32 bits, a long long is 64 bits, floats are 32 bits, and pointers are 16 bits (function pointers are word addresses). Arguments are allocated left to right, starting in register r25 descending through register r8. All arguments are aligned to start in even-numbered registers (odd-sized arguments, like char, have one free register above them), for example, a single 8-bit value is passed via the r24 register (r25 is assumed empty), a single 16-bit value is passed via the r25:r24 register pair, and a 32-bit value would be passed via r25:r24:r23:r22 register combination.

Return values are expected to be passed in a similar fashion. An 8-bit value is passed via r24, a 16-bit value in r25:r24, and 32-bits in r22:r23:r24:r25. An 8-bit return value may be zero/sign-extended to 16-bits by the called function.

What’s the Use of a Register?

Function “call-used” registers are r18-r27, and r30-r31. Any, or all of these registers may be allocated by the compiler for local data. However, we may use them freely in assembler subroutines. Calling C subroutines can clobber any of them, and the caller is responsible for saving and restoring before and after use.

Function “call-saved” registers are r2-r17, and r28-r29. They may also be allocated by the compiler for local data, but C subroutines leaves them unchanged. Assembler subroutines are responsible for saving and restoring any of these registers, if changed. The Y register pair (r29:r28) is used as a frame pointer (pointing to local data placed on the stack) if necessary.

Fixed registers, r0, and r1 are never allocated by the compiler for local data. The temporary register, r0 can be clobbered by any C code (except interrupt handlers which save it), and may be used freely. The zero register is r1, and assumed to be always zero in any C code. It may be used for other purposes within a piece of assembler code, but must then be cleared after use (clr r1). Interrupt handlers save and clear r1 on entry, and restore r1 on exit (in case it was non-zero).

References

AVR 8-bit Instruction Set
AVR-GCC Inline Assembler Cookbook
Extended Asm – Assembler Instructions with C Expression Operands
Mixing C and Assembly Language

Also available as a book, with greatly expanded coverage!

BookCover
[click on the image]

Posted in Uncategorized | Tagged , , , , , | Leave a comment