GCC Inline Assembler Cookbook

cookbook

Here is a most basic example:

  asm("nop \n”);

The general form of an inline assembler statement is:

asm(code : output operand list : input operand list : clobber list);

The special sequence of linefeed and tab characters helps keep the assembler listing looking nice, and is required to prevent more than one instruction appearing on a line (typically the culprit of the dreaded “garbage at end of line” error message). It may seem a bit odd at first, but that’s the way the compiler creates its own assembler code when compiling C statements.

Beyond this very basic explanation, you will need to read the Official AVR-GCC Inline Assembler Cookbook, or follow along with my arduino Inline Assembly Tutorial Series starting here.

The following is a collection of notes I have found helpful:

Keep in mind, the preprocessor doesn’t expanded macros inside strings, which is why anything like the following fails:

  __asm__("in r16, PINB");

Therefore, constants defined in macros must be passed through operands/constraints.

Constraint characters may be pre-pended by a single modifier. Constraints without a modifier specify read-only operands. Modifiers are:

= Write-only operand, usually used for all output operands. 
+ Read-write operand.
& Register should be used for output only.

The full list of constraints can be found here.

A few worthwhile remarks about AVR inline assembly found here.

If all of this talk about constraints, modifiers and operands makes your eyes roll back in your head, then why not get the book? Now with greatly expanded coverage:

BookCover
[click on the image]

For those of us that learn by observing, here follows a collection of shamelessly stolen recipes from around the internet (not really, most of these examples are my very own):

First, IO register names are defined with their “memory” address in mind. To use them as IO (as parameter of instructions in, out, sbi/cbi, sbis/sbic), 0x20 has to be subtracted, or simply use the _SFR_IO_ADDR macro:

  asm (
    "sbic %[_PIND], 6 \n\t" 
    : : [_PIND] "I" (_SFR_IO_ADDR (PIND))
  );

Set a pin ( equivalent to digitalWrite(13, HIGH) ):

  asm 
    "sbi %0, %1 \n\t"
    : : "I" (_SFR_IO_ADDR(PORTB)), "I" (PORTB5)
  );

Clear a pin ( equivalent to digitalWrite(13, LOW) ):

  asm (
    "cbi %0, %1 \n\t"
    : : "I" (_SFR_IO_ADDR(PORTB)), "I" (PORTB5)
  );

Store Constant to SRAM

#define ANSWER_TO_LIFE 42

volatile char a;

asm volatile (
  "ldi %0, %1 \n"
  : "=r" (a) : "M" (ANSWER_TO_LIFE)
);

16-bit Integer Example

volatile int a = 0;

asm volatile(
  "ldi r24, lo8(32767) \n" //0xff
  "ldi r25, hi8(32767) \n" //0x7f
  "sts (a), r24        \n" //lsb
  "sts (a + 1), r25    \n" //msb
  : : : "r24", "r25"
);

Swap Variables

//C language example:
char c, b=20, a=10;

c = a; 
a = b; 
b = a;

In inline assembler:

char a = 10;
char b = 20;

asm (
  "lds r24, (a) \n"
  "lds r26, (b) \n"
  "sts (b), r24 \n" //exchanging registers
  "sts (a), r26 \n"
  : : : "r24", "r26"
);

Swap Between Two 16-bit Integers

volatile int a = 0xa1a2;
volatile int b = 0xb1b2;

asm (
  "mov %A0, %A3 \n"
  "mov %B0, %B3 \n"
  "mov %A1, %A2 \n"
  "mov %B1, %B2 \n"
  : "=r" (a), "=r" (b) : "r" (a), "r" (b)
);

This following version might look confusing, but it’s the proper way to do this (read my book to find out more):

char a = 10;
char b = 20;

asm (
  "" : "=r" (a), "=r" (b) : "0" (b), "1" (a) 
);

The same method with integers:

volatile int a = 0xa1a2;
volatile int b = 0xb1b2;

asm volatile(
  "" : "=r" (a), "=r" (b) : "0" (b), "1" (a) 
);
int x = 10;
int y = x;
  
//is the same as-

asm(
  "ldi r24, 10    \n\t"
  "ldi r25, 0     \n\t"
  "sts (x), r24   \n\t" 
  "sts (x+1), r25 \n\t"
  "lds r24, (x)   \n\t"
  "lds r25, (x+1) \n\t"
  "sts (y), r24   \n\t"
  "sts (y+1), r25 \n\t"  
  : : : "r24", "r25"
);
  char a += 1;
  
  //is the same as-

  asm(
    "lds r24, (a) \n\t"
    "inc r24      \n\t"
    "sts (a), r24 \n\t"
    ::: "r24"
  );

Adding a constant to a register is tricky in AVR assembler because there is no add-immediate instruction. One way to do it is to negate the constant and subtract the negative constant from the register (double negative addition):
x – (-2) = x + 2

  char a += 2;
  
  //is the same as-

  asm(
    "lds  r24, (a) \n\t"
    "subi r24, -2  \n\t"
    "sts  (a), r24 \n\t"
    ::: "r24"
  );
  char a += 10;
  
  //is the same as-

  asm(
    "lds r24, (a) \n\t"
    "ldi r25, 10  \n\t"
    "add r24, r25 \n\t"
    "sts (a), r24 \n\t"
    ::: "r24", "r25"
  );
  int x += 10;
  
  //is the same as-

  asm(
    "lds r24, (x)   \n\t"
    "lds r25, (x+1) \n\t"
    "add r24, 10    \n\t"
    "adc r25, 0     \n\t"
    "sts (x), r24   \n\t"
    "sts (x+1), r25 \n\t"
    ::: "r24", "r25"
  );

analogWrite

This version of AnalogWrite, with no frills saves ~542 bytes over the built-in function:

//analogWrite requires a PWM pin 
//PWM pin/timer table:
//3:  (TIMER2B) PD3/TCCR2A/COM2B1/OCR2B
//5:  (TIMER0B) PD5/TCCR0A/COM0B1/OCR0B
//6:  (TIMER0A) PD6/TCCR0A/COM0A1/OCR0A
//9:  (TIMER1A) PB1/TCCR1A/COM1A1/OCR1A
//10: (TIMER1B) PB2/TCCR1A/COM1B1/OCR1B
//11: (TIMER2A) PB3/TCCR2A/COM2A1/OCR2A
//set below 6 defines per above table
#define ANALOG_PORT         PORTB
#define ANALOG_PIN          PORTB3
#define ANALOG_DDR          DDRB
#define TIMER_REG           TCCR2A
#define COMPARE_OUTPUT_MODE COM2A1
#define COMPARE_OUTPUT_REG  OCR2A

volatile uint8_t val = 128; //0-255

asm (
    "sbi  %0, %1   \n" //DDR set to output (pinMode)

    "cpi  %6, 0    \n" //if full low (0)
    "breq _SetLow  \n"
    "cpi  %6, 0xff \n" //if full high (0xff)
    "brne _SetPWM  \n"

    "sbi  %2, %1   \n" //set high
    "rjmp _SkipPWM \n"

  "_SetLow:        \n"
    "cbi  %2, %1   \n" //set low
    "rjmp _SkipPWM \n"

  "_SetPWM:        \n"
    "ld   r24, X   \n"
    "ori  r24, %3  \n"
    "st   X, r24   \n" //connect pwm pin timer# & channel
    "st   Z, %6    \n" //set pwm duty cycle (val)

  "_SkipPWM:       \n"
    : : "I" (_SFR_IO_ADDR(ANALOG_DDR)), "I" (ANALOG_PIN),
    "I" (_SFR_IO_ADDR(ANALOG_PORT)), "M" (_BV(COMPARE_OUTPUT_MODE)),
    "x" (_SFR_MEM_ADDR(TIMER_REG)), "z" (_SFR_MEM_ADDR(COMPARE_OUTPUT_REG)), "r" (val)
    : "r24"
);

analogRead

//Define various ADC prescales
#define PS2   (1<<ADPS0)                             //8000kHz ADC clock freq
#define PS4   (1<<ADPS1)                             //4000kHz
#define PS8   ((1<<ADPS0) | (1<<ADPS1))              //2000kHz
#define PS16  (1<<ADPS2)                             //1000kHz
#define PS32  ((1<<ADPS2) | (1<<ADPS0))              //500kHz
#define PS64  ((1<<ADPS2) | (1<<ADPS1))              //250kHz
#define PS128 ((1<<ADPS2) | (1<<ADPS1) | (1<<ADPS0)) //125kHz
#define ANALOG_V_REF     DEFAULT //INTERNAL, EXTERNAL, or DEFAULT
#define ADC_PRESCALE     PS128   //PS16, PS32, PS64 or P128(default)

uint16_t aRead(uint8_t channel) {
  uint16_t result;
  
  asm (
    "andi %1, 0x07    \n" //force pin==0 thru 7
    "ori  %1, (%6<<6) \n" //(pin | ADC Vref)
    "sts  %2, %1      \n" //set ADMUX

    "lds  r18, %3             \n" //get ADCSRA
    "andi r18, 0xf8           \n" //clear prescale bits
    "ori  r18, ((1<<%5) | %7) \n" //(new prescale | ADSC)
    "sts  %3, r18             \n" //set ADCSRA

    "_loop:       \n" //loop until ADSC cleared
    "lds  r18, %3 \n"
    "sbrc r18, %5 \n"
    "rjmp _loop   \n"

    "lds  %A0, %4   \n" //result = ADCL 
    "lds  %B0, %4+1 \n" //ADCH

    : "=r" (result) : "r" (channel), "M" (_SFR_MEM_ADDR(ADMUX)),
    "M" (_SFR_MEM_ADDR(ADCSRA)), "M" (_SFR_MEM_ADDR(ADCL)),
    "I" (ADSC), "I" (ANALOG_V_REF), "M" (ADC_PRESCALE)
    : "r18"
  );
  
  return result;
}

pinMode (OUTPUT)

The code presented from here on has previously been explained inside the Arduino Inline Tutorial Series.

asm (
  "sbi %0, %1 \n" //1=OUTPUT
    : : "I" (_SFR_IO_ADDR(DDRB)), "I" (DDB5)
);

pinMode (INPUT PULLUP)

asm (
  "cbi %0, %2 \n"
  "sbi %1, %2 \n"
    : : "I" (_SFR_IO_ADDR(DDRB)), "I" (_SFR_IO_ADDR(PORTB)), "I" (DDB5)
);

pinMode (INPUT)

asm (
  "cbi %0, %2 \n"
  "cbi %1, %2 \n"
    : : "I" (_SFR_IO_ADDR(DDRB)), "I" (_SFR_IO_ADDR(PORTB)), "I" (DDB5)
);

pinMode Multiple pins

#define PIN_DIRECTION 0b00101000 //PIN 3 & 5 OUTPUT
//#define PIN_DIRECTION (1<<DDB3) | (1<<DDB5)
asm (
  "out %0, %1 \n"
  : : "I" (_SFR_IO_ADDR(DDRB)), "r" (PIN_DIRECTION)
);

digitalWrite HIGH

asm (
  "sbi %0, %1 \n"
  : : "I" (_SFR_IO_ADDR(PORTB)),"I" (PORTB5)
);

digitalWrite LOW

asm (
  "cbi %0, %1 \n"
  : : "I" (_SFR_IO_ADDR(PORTB)), "I" (PORTB5) 
);

digitalWrite(output)

volatile uint8_t output = HIGH; //LOW or HIGH
asm (
  "cpi %2, 0     \n"
  "breq 1f       \n"
  "sbi %0, %1    \n"
  "rjmp 2f       \n"
  "1: cbi %0, %1 \n"
  "2:            \n"
  : : "I" (_SFR_IO_ADDR(PORTB)), "I" (PORTB5), "r" (output)
);

digitalRead

volatile uint8_t status;
 
asm (
  "cli         \n"
  "ldi %0, 1   \n" 
  "sbis %1, %2 \n" //skip next if pin high
  "clr %0      \n"
  "sei         \n" 
  : "=r" (status) : "I" (_SFR_IO_ADDR(PINB)), "I" (PINB5)  
);

Example of turning off PWM for arduino digital pin #11

//digital PWM pin registers:
//3:  (TIMER2B) PD3/TCCR2A/COM2B1/OCR2B
//5:  (TIMER0B) PD5/TCCR0A/COM0B1/OCR0B
//6:  (TIMER0A) PD6/TCCR0A/COM0A1/OCR0A
//9:  (TIMER1A) PB1/TCCR1A/COM1A1/OCR1A
//10: (TIMER1B) PB2/TCCR1A/COM1B1/OCR1B
//11: (TIMER2A) PB3/TCCR2A/COM2A1/OCR2A
asm (
  "ld  r16, Z \n"
  "ldi r17, 0xff \n"
  "eor r17, %1 \n"
  "and r16, r17 \n"
  "st  Z, r16 \n"
  : : "z" (_SFR_MEM_ADDR(TCCR2A)), "d" (COM2A1) : "r16", "r17"
);

Array Access

An array can be indexed by loading the base address into a register pair r31:r30 (called the Z register) then adding the index. In the following example, the maximum index is less than 256, so the addition was extended to 16-bits by adding zero (with carry) to the high byte of the pointer. Doing “ld r18, Z” loads the data value from the stored array. Using “ld r18, Z+” auto increments the address after loading:

unsigned char i;
volatile unsigned char samples[255];

asm(
    "ldi r30, lo8(samples) \n\t"
    "ldi r31, hi8(samples) \n\t"
    "lds r24, (i)          \n\t"
    "add r30, r24          \n\t"  //low byte
    "adc r31, r1           \n\t"  //high byte
    : : : "r24", "r30", "r31"
);

AVR App Note 305 Serial Transfer

//asynchronous 8n1 serial transmit byte, see AVR App Note 305
void SerialPutChar(unsigned char c) {
#define TX_PIN 6 //TX pin is PA6
#define B 31	//38400 bps@8MHz with 0.3% error	
//#define B 66 //19200 bps@8MHz with 0.6% error
  asm(
       "ldi  r16, 10          \n\t" //1 start +8 data +1 stop bits (bit count) 
       "com  r24              \n\t" //Invert everything (r24 = byte to xmit)
       "sec                   \n\t" //Start bit
  "0:                         \n\t"
       "brcc 1f               \n\t" 
       "cbi %0, %1            \n\t" //send a '0'
       "rjmp 2f               \n\t"
  "1:                         \n\t"
       "sbi %0, %1            \n\t" //send a '1'
       "nop                   \n\t"
  "2:                         \n\t"
       "rcall UARTDelay       \n\t" //1/2 bit delay +
       "rcall UARTDelay       \n\t" //1/2 bit delay = 1bit delay
       "lsr   r24             \n\t" //Get next bit
       "dec   r16             \n\t" //all bits sent?
       "brne  0b              \n\t"
       "rjmp SerialPutCharEnd \n\t"
//half bit delay
  "UARTDelay:                 \n\t"
       "ldi  r17, %2          \n\t"
  "3:   dec  r17              \n\t"
       "brne 3b               \n\t"
       "ret                   \n\t"
  "SerialPutCharEnd:          \n\t"
    :: "I" (_SFR_IO_ADDR(PORTB)), "I" (TX_PIN), "I" (B)
    :"r16", "r17", "r24"
  );

Delays

Delay example lifted from the gcc avr library. This also demonstrates a C-wrapper or stub function surrounding the assembler routine:

void delay_loop(uint16_t count) {
  //this loop executes 4 cycles per iteration, not including the overhead
  //@16MHz = 1.6ms
  asm volatile (
    "1: sbiw %0, 1 \n\t"
    "brne 1b       \n\t"
    : "=w" (count)
    : "0" (count)
  );
}
#define CLOCK_MHZ       16UL
#define DELAY_LENGTH_MS 1000UL
#define DELAY_VALUE     (uint32_t)((CLOCK_MHZ * 1000UL * DELAY_LENGTH_MS) / 5UL)
 
asm (
"loop:          \n" 
  "subi %A2, 1  \n"
  "sbci %B2, 0  \n"
  "sbci %C2, 0  \n" //note: 1 byte short of full 32-bits
  "brcc loop    \n"
  : : "r" (DELAY_VALUE) 
);

Bit Manipulation Macros

#define bit_is_clear(port, bit) ({                \
  uint8_t t;                                      \
  asm volatile (                                  \
    "clr %0 \n\t"                                 \
    "sbis %1, %2 \n\t"                            \
    "inc %0"                                      \
    : "=r" (t)                                    \
    : "I" ((uint8_t)(port)), "I" ((uint8_t)(bit)) \
  );                                              \
  t;                                              \
})

#define bit_is_set(port, bit) ({                  \
  uint8_t t;                                      \
  asm volatile (                                  \
    "clr %0 \n\t"                                 \
    "sbic %1, %2 \n\t"                            \
    "inc %0"                                      \
    : "=r" (t)                                    \
    : "I" ((uint8_t)(port)), "I" ((uint8_t)(bit)) \
  );                                              \
  t;                                              \
})

#define loop_until_bit_is_set(port, bit)          \
  asm volatile (                                  \
    "L_%=: " "sbis %0, %1 \n\t"                   \
    "rjmp L_%="                                   \
    : /* no outputs */                            \
    : "I" ((uint8_t)(port)), "I" ((uint8_t)(bit)) \
  )

#define loop_until_bit_is_clear(port, bit)        \
  asm volatile (                                  \
    "L_%=: " "sbic %0, %1 \n\t"                   \
    "rjmp L_%="                                   \
    : /* no outputs */                            \
    : "I" ((uint8_t)(port)), "I" ((uint8_t)(bit)) \
  )

String Length: strlen

inline uint8_t _strlen(const char *s) {
  uint8_t len;

  asm (
  "_loop:              \n"
  "ld  __tmp_reg__, Z+ \n"
  "tst __tmp_reg__     \n"
  "brne _loop          \n"
  //len=Z - 1 – src = (-1 - src) + Z = ~src + Z
  "com %A2             \n"
  "com %B2             \n"
  "add %A2, %A1        \n"
  "adc %B2, %B1        \n"
  : "=r" (len) : "z" (s), "x" (s)
  );
  return len;
}

String Copy: strcpy

 
inline void _strcpy(const char *src, char *dst) {
  asm (
    "_copy:               \n"
    "ld   __tmp_reg__, Z+ \n"
    "st   X+, __tmp_reg__ \n"
    "tst  __tmp_reg__     \n"
    "brne _copy           \n"
    : : "x" (dst) , "z" (src)
  );
}

String Compare: strcmp

char s1[4] = "abc";
char s2[4] = "xyz";

inline int16_t _strcmp(const char *s1, char *s2) {
  int16_t result;

  asm (
    "_compare:                     \n"
    "ld   %A0, X+                  \n"
    "ld   __tmp_reg__, Z+          \n"
    "sub  %A0, __tmp_reg__         \n"
    "cpse __tmp_reg__, __zero_reg__\n"
    "breq _compare                 \n"
    "sbc  %B0, %B0                  \n"
    : "=&r" (result) : "x" (s1) , "z" (s2)
  );
  return result;
}

String Concatenate: strcat

inline void _strcat(const char *src, char *dst) {
  asm (
    "_dst:                \n" //find end of destination
    "ld   __tmp_reg__, X+ \n"
    "tst  __tmp_reg__     \n"
    "brne _dst            \n"
    "sbiw %A0, 1          \n" //undo post-increment
    "_src:                \n" //copy src to dst
    "ld   __tmp_reg__, Z+ \n"
    "st   X+, __tmp_reg__ \n"
    "tst  __tmp_reg__     \n"
    "brne _src            \n"
    : : "x" (dst), "z" (src)
  );
}

isspace

uint8_t _isspace(unsigned char c) {
  uint8_t result;

  asm (
    "cpi  %1, ' '  \n"
    "breq 1f       \n" //branch if equal
    "clr  %0       \n" //false
    "rjmp 2f       \n"
    "1: ldi  %0, 1 \n" //true
    "2:            \n" //exit
    : "=r" (result) : "r" (c)
  );

  return result;
}

isdigit

uint8_t _isdigit(unsigned char c) {
  uint8_t result;

  asm (
    "subi %1, 0x30 \n"
    "brmi 2f       \n" //branch if minus
    "subi %1, 10   \n"
    "brpl 2f       \n" //brnch if plus
    "ldi  %0, 1    \n" //true
    "rjmp 3f       \n"
    "2: clr  %0    \n" //false
    "3:            \n" //exit
    : "=r" (result) : "r" (c)
  );

  return result;
}

isalpha

uint8_t _isalpha(unsigned char c) {
  uint8_t result;
 
  asm (
    "sbrs %1, 6     \n" //check bit 6
    "rjmp 1f        \n" //bit 6 is clear, cannot be alpha
    "andi %1, ~0x60 \n" //clear bit 5&6
    "breq 1f        \n" //0 cannot be alpha
    "subi %1, 27    \n" //26 letters 
    "brpl 1f        \n" //>z cannot be alpha
    "ldi %0, 1      \n" //true
    "rjmp 2f        \n"
    "1: clr  %0     \n" //false
    "2:             \n" //exit
    : "=r" (result) : "r" (c)
  );
 
  return result;
}

Sine Table

#include <avr/pgmspace.h>

//max errror ~0.017452 [91*4=364 bytes]
static const float PROGMEM SineTable[91] = {
  0.0, 0.017452, 0.034899, 0.052336, 0.069756, 0.087156, 
  0.104528, 0.121869, 0.139173, 0.156434, 0.173648, 0.190809, 
  0.207912, 0.224951, 0.241922, 0.258819, 0.275637, 0.292372, 
  0.309017, 0.325568, 0.34202, 0.358368, 0.374607, 0.390731, 
  0.406737, 0.422618, 0.438371, 0.45399, 0.469472, 0.48481, 
  0.5, 0.515038, 0.529919, 0.544639, 0.559193, 0.573576, 
  0.587785, 0.601815, 0.615661, 0.62932, 0.642788, 0.656059, 
  0.669131, 0.681998, 0.694658, 0.707107, 0.71934, 0.731354, 
  0.743145, 0.75471, 0.766044, 0.777146, 0.788011, 0.798636, 
  0.809017, 0.819152, 0.829038, 0.838671, 0.848048, 0.857167, 
  0.866025, 0.87462, 0.882948, 0.891007, 0.898794, 0.906308, 
  0.913545, 0.920505, 0.927184, 0.93358, 0.939693, 0.945519, 
  0.951057, 0.956305, 0.961262, 0.965926, 0.970296, 0.97437, 
  0.978148, 0.981627, 0.984808, 0.987688, 0.990268, 0.992546, 
  0.994522, 0.996195, 0.997564, 0.99863, 0.999391, 0.999848, 1.0
};

float _Sine(uint16_t angle) {
  float tmp;

  asm (
    //validate angle >= 0 && angle <= 90
    "cpi  %A1, 90+1 \n" 
    "cpc  %B1, __zero_reg__ \n"
    "brcc _NaN      \n" //out of range

     //calculate table index
    "lsl  %A1       \n" //float is 4 bytes wide
    "rol  %B1       \n" //index = angle * 4
    "lsl  %A1       \n"
    "rol  %B1       \n"

    //add index to start of SineTable
    "add  r30, %A1  \n" 
    "adc  r31, %B1  \n"

    //get sine value (4-bytes)
    "lpm  %A0, Z+   \n" 
    "lpm  %B0, Z+   \n"
    "lpm  %C0, Z+   \n"
    "lpm  %D0, Z    \n"
    "ret            \n" //exit
    
    //return NAN
    "_NaN:              \n" 
    "ldi  %A0, lo8(%3)  \n" //NAN = 0x7fc00000
    "ldi  %B0, hi8(%3)  \n"
    "ldi  %C0, hlo8(%3) \n"
    "ldi  %D0, hhi8(%3) \n"
    : "=r" (tmp) : "r" (angle), "z" (SineTable), "F" (NAN)
  );
  return tmp;
}

isupper & tolower

extern "C" {
  unsigned char _isupper(unsigned char c) {
    //bind variable to a specific register r18
    register unsigned char ch asm("r18");
    
    asm (
      "mov  %1, %0 \n" //save input
      "subi %1, 'A'\n" //subtract 0x41
      "brmi 2f     \n" //branch if minus
      "subi %1, 26 \n" //26 letters
      "brpl 2f     \n" //branch if plus
      "ret         \n" //c==upper, return
      "2: clr  %0  \n" //false
      : "+r" (c) : "r" (ch) 
    );
    
    return c;
  }
}

char _tolower(unsigned char c) {
  asm (
    "call _isupper \n" //validate char
    "tst r24       \n" //0 = not alpha char
    "breq 1f       \n" //not alpha char
    "ori %0, 0x20  \n" //make lower
    "1:            \n"
    : "+r" (c)
  );
  
  return c;
}

strlen

inline uint8_t _strlen(const char *s) {
  uint8_t len;

  asm (
    "_loop:              \n"
    "ld  __tmp_reg__, Z+ \n"
    "tst __tmp_reg__     \n"
    "brne _loop          \n"
    //len=Z - 1 – src = (-1 - src) + Z = ~src + Z
    "com %A2             \n"
    "com %B2             \n"
    "add %A2, %A1        \n"
    "adc %B2, %B1        \n"
    : "=r" (len) : "z" (s), "x" (s)
  );

  return len;
}

Blink with Interrupt

#include "k328p.h"

#define TCNT_BASE   0x0bdc
#define TCNT_BASE_H (((TCNT_BASE)>>8)&0xff)
#define TCNT_BASE_L ((TCNT_BASE)&0xff)

ISR(TIMER1_OVF_vect, ISR_NAKED) {
  asm (
    "push r31           \n" //save r30, r31 contents
    "push r30           \n"
    "push r24           \n"
    //preserve SREG
    "in   r24, __SREG__ \n"
    "push r24           \n"

    //reload TCNT1 counter for 1sec interrupt
    "clr r31            \n"
    "ldi r30, %2        \n"
    "ldi r24, %3        \n"
    "st  Z+, r24        \n" //TCNT1L
    "ldi r24, %4        \n"
    "st  Z, r24         \n" //TCNT1H
    //toggle LED
    "in   r30, %0       \n" //read port
    "ldi  r31, %1       \n" //LED bit mask
    "eor  r30, r31      \n" //toggle LED bit
    "out  %0, r30       \n" //write port

    //restore old SREG
    "pop  r24           \n"
    "out  __SREG__, r24 \n"
    //restore r30, r31
    "pop r24            \n"
    "pop  r30           \n"
    "pop  r31           \n"
    "reti               \n"
    : : "I" (kPORTB), "I" (_BV(PORTB5)), 
    "M" (kTCNT1), "M" (TCNT_BASE_L), "M" (TCNT_BASE_H)
  );
}

void setup() {
  uint16_t TNCTBase = TCNT_BASE;

  asm (
    "cli                  \n" //disable gloal interrupts 
    "sbi %0, %1           \n" //pinMode(13, OUTPUT);

    //set 256 prescale (CS12)
    "st  Z+, __zero_reg__ \n" //TCCR1A
    "ldi r24, %3          \n"
    "st  Z+, r24          \n" //zero TCCR1B
    "st  Z, __zero_reg__  \n" //zero TCCR1C
    //load counter for 1sec interrupt
    "ldi r30, %4          \n"
    "st  Z+, %A5          \n" //TCNT1L
    "st  Z, %B5           \n" //TCNT1H
    //enable overflow interrupt
    "ldi r30, %6          \n"
    "ldi r24, %7          \n"
    "st  Z, r24           \n" //TIMSK1

    "sei                  \n" //enable global interrupts 
    : : "I" (_SFR_IO_ADDR(DDRB)), "I" (PORTB5),
    "z" (_SFR_MEM_ADDR(TCCR1A)), "I" (_BV(CS12)),
    "M" (kTCNT1), "r" (TNCTBase),
    "M" (kTIMSK1), "I" (_BV(TOIE1)) : "r24", "memory"
  );
}

void loop() { }

SPI Mode 0 Hardware Transfer

static __attribute__ ((noinline)) uint8_t SpiXfer(uint8_t data) {
  asm (
    "out  %1, %0          \n" //put data out SPDR register
    "nop                  \n" //pause
  "1:                     \n"
    “in   __tmp_reg__, %2 \n" //check xmit complete
    "sbrs __tmp_reg__, %3 \n"
    "rjmp 1b              \n"
    "in   %0, %1          \n" //get incoming data
    : "+r" (data) : "M" (_SFR_IO_ADDR(SPDR)),
    "M" (_SFR_IO_ADDR(SPSR)), "I" (SPIF)
  );

  return data;
}

SPI Bit-Bang

#define MOSI_PORT  PORTD
#define MOSI_BIT   PORTD5
#define MISO_PORT  PIND
#define MISO_BIT   PIND6
#define CLOCK_PORT PORTD
#define CLOCK_BIT  PORTD7

static __attribute__ ((noinline)) uint8_t SpiBitBang(uint8_t data) {
  register uint8_t tmp, i=8;
  
  //save and restore sreg because t-bit is utilized
  asm (
    "in __tmp_reg__, __SREG__ \n"
  "1:               \n"
    "sbrs %0, 0x07  \n" //is output data bit high?
    "rjmp 2f        \n" //no
    "sbi  %3, %4    \n" //output a high bit
    "rjmp 3f        \n"
  "2:               \n"
    "cbi  %3, %4    \n" //output a low bit
  "3:               \n"
    "lsl  %0        \n" //shift to next bit
    "in   %1, %5    \n" //get input
    "tst  %1        \n" //anything here?
    "breq 4f        \n" //nope
    "bst  %1, %6    \n" //set t-bit if input bit is high
    "clr  %1        \n" //zeroize register
    "bld  %1, 0     \n" //set bit 0
    "or   %0, %1    \n" //or low bit with data for return value
  "4:               \n"
    "sbi  %7, %8    \n" //toggle clock bit high
    "nop            \n" //pause
    "cbi  %7, %8    \n" //toggle clock bit low
    "subi %2, 1     \n" //more bits?
    "brne 1b        \n" //do next bit
    "out __SREG__, __tmp_reg__ \n"
    : "+r" (data), "=&r" (tmp): "a" (i),
    "M" (_SFR_IO_ADDR(MOSI_PORT)), "I" (MOSI_BIT),
    "M" (_SFR_IO_ADDR(MISO_PORT)), "I" (MISO_BIT),
    "M" (_SFR_IO_ADDR(CLOCK_PORT)),  "I" (CLOCK_BIT)
  );

  return data;
}

atoi

int16_t _atoi(const char *s) {
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wuninitialized"
  //sign & c are initialized inside inline asm code
  register uint8_t sign, c;
#pragma GCC diagnostic pop
  //force result into return registers
  register int16_t result asm("r24"); 
  
  asm (
    "ldi  %A0, 0x00         \n" //result = 0
    "ldi  %B0, 0x00         \n"

  "1:                       \n"
    "ld   %2, Z+            \n" //fetch char
    "cpi  %2, '-'           \n" //negative sign?
    "brne 2f                \n"
    "ldi  %3, 0x01          \n" //sign = TRUE

  "2:                       \n"
    "cpi  %2, '/' + 1       \n" //step over whitespace/garbage
    "brcc 3f                \n"
    "rjmp 1b                \n"

  "3:                       \n"
    "rjmp 5f                \n"

  "4:                       \n"
    "ldi  r23, 10           \n" //result *= 10
    "mul  %B0, r23          \n"
    "mov  %B0, r0           \n"
    "mul  %A0, r23          \n"
    "mov  %A0, r0           \n"
    "add  %B0, r1           \n"
    "clr  __zero_reg__      \n" //r1 trashed by mul
    "add  %A0, %2           \n" //result += new digit
    "adc  %B0, __zero_reg__ \n"
    "ld   %2, Z+            \n" //fetch next digit char
  
  "5:                       \n"
    "subi %2, '0'           \n" //convert char to 0-9
    "cpi  %2, 10            \n" //end of string?
    "brlo 4b                \n"

    "cpi  %3, 0             \n" //negative?
    "breq 6f                \n"
    "com  %B0               \n" //negate result
    "neg  %A0               \n"
    "sbci %B0, -1           \n"
  
  "6:                       \n"
    : "+r" (result) : "z" (s), "a" (c), "a" (sign) 
  );

  return result;
}

OSCAAL calibration

#ifndef OSCCAL
// program memory location of the internal oscillator "calibration byte"
#define OSCCAL 1024 //default to last byte of program memory
#endif

#define _osccal(addr)                    \
  asm volatile (                         \
    "lpm" "\n\t"                         \
    "out 0x31,r0" /* OSCCAL register */  \
    : /* no outputs */                   \
    : "z" ((uint16_t)(addr))             \
    : "r0" /* clobbers */                \
  )
#define osccal() _osccal(OSCCAL) /* calibrate internal RC oscillator */

Also available as a book, with greatly expanded coverage!

BookCover
[click on the image]

About Jim Eli

µC experimenter
This entry was posted in Uncategorized and tagged , , , , , , , , . Bookmark the permalink.

3 Responses to GCC Inline Assembler Cookbook

  1. Immanuel V says:

    OMG this is a resume of lot of pages, thanks a lot!! I’m tryin’ to program arduino uno with assembler using atmel visual 6.2

  2. Emin says:

    The rules mentioned above don’t work in avr studio 4 C code.

    • Jim Eli says:

      AVR Studio 4 is an unsupported product that has been superseded by several generations (5, 6 and 7). In fact, AVR Studio 4 will not even run on my Windows 10 OS. But thanks for noticing.

Leave a comment