My Cup Overflows

overflow

When performing math (even basic addition and subtraction) with signed numbers an overflow problem sometimes arises. The Arduino microcontroller indicates the existence of an overflow error by setting the overflow flag in the SREG. Here’s a demonstration of the overflow problem with a simple addition operation:

volatile int8_t n1=0x70; //112
volatile int8_t n2=0x35; //53
volatile int8_t answer;

void setup() {
  Serial.begin(9600);
  
  asm(
    "add %1, %2 \n"
  
    : "=r" (answer) : "r" (n1), "r" (n2)
  );
  
  Serial.print("answer = "); Serial.println(answer);
}

The result to the above addition is, “answer = -91”, or 0xA5 hexadecimal. That’s wrong! The reason the answer turns out wrong is because the result is larger than an 8-bit register can hold.

The largest “signed 8-bit number” is +127, or 0x7f hexadecimal. However, this operation did set the Status Register Overflow Flag (V flag) to warn us that the result is erroneous. But, it’s completely up to us, the programmer to deal with this issue.

What’s Your Sign?

In “8-bit signed number” operations, the overflow flag is set when either of the following two conditions occur:

• There is a carry from bit 6 to bit 7, but no carry out of bit 7 (C flag not set).
• There is a carry out of bit 7 (C flag set), but no carry from bit 6 to bit 7.

I bring these two cases to your attention, because we can perform addition on two negative numbers with the sign bit remaining correct, yet the addition could still overflow. For example, when adding -2 (0x80) and -128 (0xFE), the result becomes 0x7E (+126), which again is incorrect.

When adding two numbers with different signs, the absolute value of the result is a smaller number than the absolute value of the operands prior to the addition. In this case, an overflow is impossible.

Therefore, an overflow is only possible when adding two numbers with the same sign. Furthermore, when adding two “same-signed numbers”, the sign of the result must be the same. The conclusion here is, for signed number addition, if the overflow flag is set, the result is invalid, and in unsigned addition, if the carry flag is set, the result is invalid. In signed number operations, overflow is possible, and overflow corrupts the result and negates the sign bit.

See my tutorial on Arduino Inline Assembly Math here.

Advertisements
Posted in arduino, assembly language, avr, avr inline assenbly | Tagged , , , , , | Leave a comment

Arduino Inline Assembly Port & Pin Compendium

compendium

The following is a compendium of inline assembly functions dealing with ports and pins. Use these at your own risk. These functions have been trimmed of most bounds checking, so they can easily be abused. The Arduino Inline Assembly Tutorial explains most of the details starting here.

analogWrite

This inline code writes an analog value (in the form of a PWM wave) to a particular pin. After executing, the pin will generate a steady square wave of the specified duty cycle until the next call (or call to digitalRead() or digitalWrite() on the same pin). The frequency of the PWM signal on most pins is approximately 490 Hz. On the Uno and similar boards, pins 5 and 6 have a frequency of approximately 980 Hz. On Arduino boards with the ATmega168/328, this function works on pins 3, 5, 6, 9, 10, and 11. The analogWrite function has nothing to do with the analog pins or the analogRead function.

A pinMode() call is included inside this function, so there is no need to set the pin as an output before executing this code.

This version of AnalogWrite, with no frills saves ~542 bytes over the built-in function:

//analogWrite requires a PWM pin 
//PWM pin/timer table:
//3:  (TIMER2B) PD3/TCCR2A/COM2B1/OCR2B
//5:  (TIMER0B) PD5/TCCR0A/COM0B1/OCR0B
//6:  (TIMER0A) PD6/TCCR0A/COM0A1/OCR0A
//9:  (TIMER1A) PB1/TCCR1A/COM1A1/OCR1A
//10: (TIMER1B) PB2/TCCR1A/COM1B1/OCR1B
//11: (TIMER2A) PB3/TCCR2A/COM2A1/OCR2A
//set below 6 defines per above table
#define ANALOG_PORT         PORTB
#define ANALOG_PIN          PORTB3
#define ANALOG_DDR          DDRB
#define TIMER_REG           TCCR2A
#define COMPARE_OUTPUT_MODE COM2A1
#define COMPARE_OUTPUT_REG  OCR2A

volatile uint8_t val = 128; //0-255

  asm (
    "sbi  %0, %1   \n" //DDR set to output (pinMode)

    "cpi  %6, 0    \n" //if full low (0)
    "breq _SetLow  \n"
    "cpi  %6, 0xff \n" //if full high (0xff)
    "brne _SetPWM  \n"

    "sbi  %2, %1   \n" //set high
    "rjmp _SkipPWM \n"

  "_SetLow:        \n"
    "cbi  %2, %1   \n" //set low
    "rjmp _SkipPWM \n"

  "_SetPWM:        \n"
    "ld   r24, X   \n"
    "ori  r24, %3  \n"
    "st   X, r24   \n" //connect pwm pin timer# & channel
    "st   Z, %6    \n" //set pwm duty cycle (val)

  "_SkipPWM:       \n"
    : : "I" (_SFR_IO_ADDR(ANALOG_DDR)), "I" (ANALOG_PIN),
    "I" (_SFR_IO_ADDR(ANALOG_PORT)), "M" (_BV(COMPARE_OUTPUT_MODE)),
    "x" (_SFR_MEM_ADDR(TIMER_REG)), "z" (_SFR_MEM_ADDR(COMPARE_OUTPUT_REG)), "r" (val)
    : "r24"
  );

analogRead

The Arduino board contains a 6 channel, 10-bit analog to digital converter which is the brains beneath the analogRead function. It maps input voltages between 0 and 5 into integer values between 0 and 1023, thus yielding a resolution between readings of: 5/1024 units or, 0.0049 volts (4.9 mV) per unit. The input range and resolution can be changed through the ANALOG_V_REF define. This code reads the value from the specified analog channel (0-7), which correspond to the analog pins (note, do NOT use A0-A7 for the channel number in this code). Further information about the underlying ADC can be found here.

While this version of analogRead (aRead) saves a few bytes (~50), it also gives the option of changing the speed via the ADC prescaler. However, don’t arbitrarily change the prescale without understanding the consequences. ATMEL advises the slowest prescale should be used (PS128). A higher speed (smaller prescale) reduces the accuracy of the AD conversion. The arduino sets the prescale to 128 during initiation, just as the code below does.

//Define various ADC prescales
#define PS2   (1<<ADPS0)                             //8000kHz ADC clock freq
#define PS4   (1<<ADPS1)                             //4000kHz
#define PS8   ((1<<ADPS0) | (1<<ADPS1))              //2000kHz
#define PS16  (1<<ADPS2)                             //1000kHz
#define PS32  ((1<<ADPS2) | (1<<ADPS0))              //500kHz
#define PS64  ((1<<ADPS2) | (1<<ADPS1))              //250kHz
#define PS128 ((1<<ADPS2) | (1<<ADPS1) | (1<<ADPS0)) //125kHz
#define ANALOG_V_REF     DEFAULT //INTERNAL, EXTERNAL, or DEFAULT
#define ADC_PRESCALE     PS128   //PS16, PS32, PS64 or P128(default)

uint16_t aRead(uint8_t channel) {
  uint16_t result;
  
  asm (
    "andi %1, 0x07    \n" //force pin==0 thru 7
    "ori  %1, (%6<<6) \n" //(pin | ADC Vref)
    "sts  %2, %1      \n" //set ADMUX

    "lds  r18, %3             \n" //get ADCSRA
    "andi r18, 0xf8           \n" //clear prescale bits
    "ori  r18, ((1<<%5) | %7) \n" //(new prescale | ADSC)
    "sts  %3, r18             \n" //set ADCSRA

    "_loop:       \n" //loop until ADSC cleared
    "lds  r18, %3 \n"
    "sbrc r18, %5 \n"
    "rjmp _loop   \n"

    "lds  %A0, %4   \n" //result = ADCL 
    "lds  %B0, %4+1 \n" //ADCH

    : "=r" (result) : "r" (channel), "M" (_SFR_MEM_ADDR(ADMUX)),
    "M" (_SFR_MEM_ADDR(ADCSRA)), "M" (_SFR_MEM_ADDR(ADCL)),
    "I" (ADSC), "I" (ANALOG_V_REF), "M" (ADC_PRESCALE)
    : "r18"
  );
  
  return result;
}

pinMode(OUTPUT)

The arduino pinMode function configures pin behavior. The code presented from here on, has been previously explained inside the Arduino Inline Tutorial Series.

asm (
  "sbi %0, %1 \n" //1=OUTPUT
    : : "I" (_SFR_IO_ADDR(DDRB)), "I" (DDB5)
);

pinMode (INPUT PULLUP)

asm (
  "cbi %0, %2 \n"
  "sbi %1, %2 \n"
    : : "I" (_SFR_IO_ADDR(DDRB)), "I" (_SFR_IO_ADDR(PORTB)), "I" (DDB5)
);

pinMode (INPUT)

asm (
  "cbi %0, %2 \n"
  "cbi %1, %2 \n"
    : : "I" (_SFR_IO_ADDR(DDRB)), "I" (_SFR_IO_ADDR(PORTB)), "I" (DDB5)
);

pinMode with Multiple Pins

#define PIN_DIRECTION 0b00101000 //PIN 3 & 5 OUTPUT
//#define PIN_DIRECTION (1<<DDB3) | (1<<DDB5)
asm (
  "out %0, %1 \n"
  : : "I" (_SFR_IO_ADDR(DDRB)), "r" (PIN_DIRECTION)
);

digitalWrite HIGH

If a pin has been configured as an OUTPUT, its voltage will be set to the corresponding value: 5V (or 3.3V on 3.3V boards) for HIGH, 0V (ground) for LOW. However, if the pin is configured as an INPUT, digitalWrite enables (HIGH) or disables (LOW) the internal pullup on the input pin.

asm (
  "sbi %0, %1 \n"
  : : "I" (_SFR_IO_ADDR(PORTB)),"I" (PORTB5)
);

digitalWrite LOW

asm (
  "cbi %0, %1 \n"
  : : "I" (_SFR_IO_ADDR(PORTB)), "I" (PORTB5) 
);

digitalWrite(output)

volatile uint8_t output = HIGH; //LOW or HIGH
asm (
  "cpi %2, 0     \n"
  "breq 1f       \n"
  "sbi %0, %1    \n"
  "rjmp 2f       \n"
  "1: cbi %0, %1 \n"
  "2:            \n"
  : : "I" (_SFR_IO_ADDR(PORTB)), "I" (PORTB5), "r" (output)
);

digitalToggle

Try to find this one in the Arduino wiring code:

//toggle pin
asm (
  "in r24, %0  \n"
  "eor r24, %1 \n"
  "out %0, r24 \n"
  : : "I" (_SFR_IO_ADDR(PORTB)), "r" ((uint8_t)_BV(PORTB5)) : "r24"
);

digitalRead

digitalRead simply reads the value from a specified digital pin, either HIGH or LOW.

volatile uint8_t status;
 
asm (
  "in __tmp_reg__, __SREG__  \n"
  "cli                       \n"                     
  "ldi %0, 1                 \n" //high 
  "sbis %1, %2               \n" //skip next if pin high
  "clr %0                    \n" //low
  "out __SREG__, __tmp_reg__ \n"
  : "=r" (status) : "I" (_SFR_IO_ADDR(PINB)), "I" (PINB5)  
);

digitalRead Alternative

This is a generic alternative, which can be called programmatically. Note it must be called using a pointer to the PIN (&PINB), otherwise the compiler emits incorrect code:

//call like so:
//uint8_t status = dRead(&PINB, PINB5);

__attribute__ ((noinline)) uint8_t dRead(volatile uint8_t *port, uint8_t pin) {
  uint8_t result, mask=1;

  asm (
    "movw  r30, %1 \n" //port reg addr in Z
  "1:              \n"
    "cpi  %2, 0    \n" //loop until pin==0
    "breq 2f       \n" //leave loop
    "lsl  %3       \n" //shift (mask) left 1 position
    "dec  %2       \n" //decrement loop counter
    "rjmp 1b       \n" //repeat
  "2:              \n"
    "in   __tmp_reg__, __SREG__ \n" //preserve sreg
    "cli           \n" //disable interrupts
    "ld   r18, Z   \n" //fetch port data
    "and  r18, %3  \n" //compare pin with mask
    "ldi  %0, 1    \n" //set return high
    "brne 3f       \n" 
    "clr  %0       \n" //set return low
  "3:              \n"
    "out  __SREG__, __tmp_reg__ \n"
    : "=&r" (result) : "r" (port), "a" (pin), "r" (mask) : "r18", "r30", "r31"
  );

  return result;
}

Example of turning off PWM for arduino digital pin #11

//digital PWM pin registers:
//3:  (TIMER2B) PD3/TCCR2A/COM2B1/OCR2B
//5:  (TIMER0B) PD5/TCCR0A/COM0B1/OCR0B
//6:  (TIMER0A) PD6/TCCR0A/COM0A1/OCR0A
//9:  (TIMER1A) PB1/TCCR1A/COM1A1/OCR1A
//10: (TIMER1B) PB2/TCCR1A/COM1B1/OCR1B
//11: (TIMER2A) PB3/TCCR2A/COM2A1/OCR2A

asm (
  "ld  r16, Z \n"
  "ldi r17, 0xff \n"
  "eor r17, %1 \n"
  "and r16, r17 \n"
  "st  Z, r16 \n"
  : : "z" (_SFR_MEM_ADDR(TCCR2A)), "d" (COM2A1) : "r16", "r17"
);

Also available as a book, with greatly expanded coverage!

BookCover
[click on the image]

Posted in arduino, assembly language, avr, avr inline assenbly | Tagged , , , , , , , | 1 Comment

Arduino Inline Assembly Tutorial (Examples)

case study

As the final tutorial in this series, we present four example inline assembly functions for the arduino. Specifically, these cover the conversion of a byte to a hexadecimal string, SPI Mode 0 hardware transfer, SPI Mode 0 Bit-banging, and the C library atoi function. Do not take these functions as archetypical examples of high-quality coding practice or brilliantly efficient inline code. They are neither.

Most of the previous examples in this series were simple “snippets of code”, and as such gave a myopic view of inline assembly. The goal here is to show complete and working demonstrations of how to include inline assembly into the typical arduino program. Each example includes explanatory comments covering the key portions of code.

In addition to these examples, have a look at the Arduino Inline Assembly Blink Program.

Stringing Hexadecimals

The following code converts a byte value into a hexadecimal string. Notice at the start of the code, that the constraint #0 value (val) is temporarily saved in the r25 register. The function then converts the first nibble. When the conversion process is complete, the function loops back and converts the second nibble. Note how the code uses the SREG T-bit to flag the first vs. second nibble.

void ByteToHexStr(uint8_t val, char *str) {
  asm (
    "set           \n" //flag first nibble
    "mov r25, %0   \n" //save val
    "swap %0       \n" //swap for correct nibble order
  "1:              \n"
    "andi %0, 0xf  \n" //mask a nibble
    "cpi  %0, 0xa  \n" //>10?
    "brcc 2f       \n" //yes
    "subi %0, 0xd0 \n" //convert numeral (0-9) 
    "rjmp 3f       \n" //skip next
  "2:              \n"
    "subi %0, 0xc9 \n" //convert letter (A-F)
  "3:              \n"
    "st Z+, %0     \n" //put into string
    "brtc 4f       \n" //upper nibble?
    "clt           \n" //clear nibble flag
    "mov %0, r25   \n" //get upper nibble
    "rjmp 1b       \n" //repeat conversion
  "4:              \n" //exit
    : : "r" (val), "z" (str) : "memory"
  );
}

I SPI With My Little Eye…

Serial Peripheral Interface (SPI) is a synchronous serial data protocol used by microcontrollers for communicating with one or more peripheral devices, or for communication between two microcontrollers. The SPI standard is loose and each device implements it a little differently, which means you must pay close attention to the device’s datasheet when implementing the protocol. Generally speaking, there are four modes of transmission, defined by the clock phase and polarity.

Here are two versions of the SPI transfer function. The first of these programs incorporates the arduino hardware SPI. The second is a bit-bang version using different pins. More information on SPI can be found here and here.

SPI Mode 0 Hardware Transfer

static __attribute__ ((noinline)) uint8_t SpiXfer(uint8_t data) {
  asm (
    "out  %1, %0          \n" //put data out SPDR register
    "nop                  \n" //pause
  "1:                     \n"
    “in   __tmp_reg__, %2 \n" //check xmit complete
    "sbrs __tmp_reg__, %3 \n"
    "rjmp 1b              \n"
    "in   %0, %1          \n" //get incoming data
    : "+r" (data) : "M" (_SFR_IO_ADDR(SPDR)),
    "M" (_SFR_IO_ADDR(SPSR)), "I" (SPIF)
  );

  return data;
}

SPI Bit-Bang

#define MOSI_PORT  PORTD
#define MOSI_BIT   PORTD5
#define MISO_PORT  PIND
#define MISO_BIT   PIND6
#define CLOCK_PORT PORTD
#define CLOCK_BIT  PORTD7

static __attribute__ ((noinline)) uint8_t SpiBitBang(uint8_t data) {
  register uint8_t tmp, i=8;
  
  //save and restore sreg because t-bit is utilized
  asm (
    "in __tmp_reg__, __SREG__ \n"
  "1:               \n"
    "sbrs %0, 0x07  \n" //is output data bit high?
    "rjmp 2f        \n" //no
    "sbi  %3, %4    \n" //output a high bit
    "rjmp 3f        \n"
  "2:               \n"
    "cbi  %3, %4    \n" //output a low bit
  "3:               \n"
    "lsl  %0        \n" //shift to next bit
    "in   %1, %5    \n" //get input
    "tst  %1        \n" //anything here?
    "breq 4f        \n" //nope
    "bst  %1, %6    \n" //set t-bit if input bit is high
    "clr  %1        \n" //zeroize register
    "bld  %1, 0     \n" //set bit 0
    "or   %0, %1    \n" //or low bit with data for return value
  "4:               \n"
    "sbi  %7, %8    \n" //toggle clock bit high
    "nop            \n" //pause
    "cbi  %7, %8    \n" //toggle clock bit low
    "subi %2, 1     \n" //more bits?
    "brne 1b        \n" //do next bit
    "out __SREG__, __tmp_reg__ \n"
    : "+r" (data), "=&r" (tmp): "a" (i),
    "M" (_SFR_IO_ADDR(MOSI_PORT)), "I" (MOSI_BIT),
    "M" (_SFR_IO_ADDR(MISO_PORT)), "I" (MISO_BIT),
    "M" (_SFR_IO_ADDR(CLOCK_PORT)),  "I" (CLOCK_BIT)
  );

  return data;
}

A Toy

Atoi is a function in the that converts a string into an integer numerical representation (atoi stands for ASCII to integer). It is included in the C standard library header file stdlib.h. It is prototyped as follows:

int atoi(const char *str);

The str argument is a string, represented by an array of characters, containing the characters of a signed integer number. The string must be null-terminated.

Here is the basic idea of the atoi function implemented in C language:

int16_t atoi(char s[]) {
  uint8_t i, sign;
  int16_t n;
  
  //skip white space
  for (i=0; s[i]<=' '; i++);
  
  //sign
  sign = 0;
  if (s[i] == '-') {
    sign = 1;
    i++;
  }
  
  //convert
  for (n=0; s[i]>='0' && s[i]<='9'; i++)
    n = 10*n + s[i] - '0';
  
  if (sign)
    return (-1*n);
  else
    return n;
}

Atoi Inline

Here is our implementation, which is only 64 bytes in length. By comparison, the arduino AVR libc atoi() function is 76 bytes long. This version is basically functionally equivalent, however there are a few detail differences (this function steps over all leading ASCII characters 0x2F and below, not just whitespace):

int16_t _atoi(const char *s) {
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wuninitialized"
  //sign & c are initialized inside inline asm code
  register uint8_t sign, c;
#pragma GCC diagnostic pop
  //force result into return registers
  register int16_t result asm("r24"); 
  
  asm (
    "ldi  %A0, 0x00         \n" //result = 0
    "ldi  %B0, 0x00         \n"

  "1:                       \n"
    "ld   %2, Z+            \n" //fetch char
    "cpi  %2, '-'           \n" //negative sign?
    "brne 2f                \n"
    "ldi  %3, 0x01          \n" //sign = TRUE

  "2:                       \n"
    "cpi  %2, '/' + 1       \n" //step over whitespace/garbage
    "brcc 3f                \n"
    "rjmp 1b                \n"

  "3:                       \n"
    "rjmp 5f                \n"

  "4:                       \n"
    "ldi  r23, 10           \n" //result *= 10
    "mul  %B0, r23          \n"
    "mov  %B0, r0           \n"
    "mul  %A0, r23          \n"
    "mov  %A0, r0           \n"
    "add  %B0, r1           \n"
    "clr  __zero_reg__      \n" //r1 trashed by mul
    "add  %A0, %2           \n" //result += new digit
    "adc  %B0, __zero_reg__ \n"
    "ld   %2, Z+            \n" //fetch next digit char
  
  "5:                       \n"
    "subi %2, '0'           \n" //convert char to 0-9
    "cpi  %2, 10            \n" //end of string?
    "brlo 4b                \n"

    "cpi  %3, 0             \n" //negative?
    "breq 6f                \n"
    "com  %B0               \n" //negate result
    "neg  %A0               \n"
    "sbci %B0, -1           \n"
  
  "6:                       \n"
    : "+r" (result) : "z" (s), "a" (c), "a" (sign) : "memory"
  );

  return result;
}

Conclusion

While there are countless more topics to cover, and many more rabbit-holes to dive down, I believe I have covered enough of the basics in this series. I sure enjoyed researching and writing these tutorials. And, hopefully you gained a few insights into the funky world of arduino (AVR) inline assembly programming. Now, get inline with your programming!

[updated: 4.11.16]

Also available as a book, with greatly expanded coverage!

BookCover
[click on the image]

Posted in arduino, assembly language, avr, avr inline assenbly | Tagged , , , , , | Leave a comment

Towards a More General digitalRead

2 words

The arduino digitalRead function is a nice bit of code. However, it takes more than a cursory glance to determine exactly how it performs (see Yak Shaving). It also compiles into approximately 222 bytes of code, and its slow in comparison to a simplified inline routine:

void setup() {
  digitalRead(13);
}

void loop() { }

Versus:

volatile uint8_t status;
 
void setup() {
  asm (
    "in __tmp_reg__, __SREG__ \n"
    "cli         \n"                     
    "ldi %0, 1   \n" 
    "sbis %1, %2 \n" //skip next if pin high
    "clr %0      \n"
    "out __SREG__, __tmp_reg__ \n"
    : "=r" (status) : "I" (_SFR_IO_ADDR(PINB)), "I" (PINB5)  
  );
}

void loop() { }

The simplified function occupies only 16 bytes. However, since this is inline code vs. a function, every time a digital read is required in your program, it will consume another 16 bytes. Yet, it would take about 13 inline routines to consume about the same amount of code the standard arduino function uses.

I doubt you’ll write a program that uses over 13 read operations. However, you might.

Another minor factor with the inline routine, is that the port and pin must be known at compile time. The port and pin are “hard coded” into assembly instruction, like so:

000002A1 1d.9b  SBIS 0x03, 5	

In the above disassembly, 0x03 is the port (PINB), and 5 is the pin bit (PINB5). Not really a big issue, unless you want to address the pin and port location programmatically. In that regard, you can’t use the simplified routine, or any of the C language MACROs floating around the Internet similar to:

#define bit_get(p,m) ((p) & (m))

But here is a generic alternative, which occupies approximately 34 bytes. Note it must be called using a pointer to the PIN (&PINB), otherwise the compiler will emit incorrect code:

//call like so:
//uint8_t status = dRead(&PINB, PINB5);

__attribute__ ((noinline)) uint8_t dRead(volatile uint8_t *port, uint8_t pin) {
  uint8_t result, mask=1;

  asm (
    "movw  r30, %1 \n" //port reg addr in Z
  "1:              \n"
    "cpi  %2, 0    \n" //loop until pin==0
    "breq 2f       \n" //leave loop
    "lsl  %3       \n" //shift (mask) left 1 position
    "dec  %2       \n" //decrement loop counter
    "rjmp 1b       \n" //repeat
  "2:              \n"
    "in   __tmp_reg__, __SREG__ \n" //preserve sreg
    "cli           \n" //disable interrupts
    "ld   r18, Z   \n" //fetch port data
    "and  r18, %3  \n" //compare pin with mask
    "ldi  %0, 1    \n" //set return high
    "brne 3f       \n" 
    "clr  %0       \n" //set return low
  "3:              \n"
    "out  __SREG__, __tmp_reg__ \n"
    : "=&r" (result) : "r" (port), "a" (pin), "r" (mask) : "r18", "r30", "r31"
  );

  return result;
}

Also available as a book, with greatly expanded coverage!

BookCover
[click on the image]

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

Arduino Inline Assembly Tutorial #4 (Constraints)

constraint

Introduction

I have a confession to make. My previous examples were not very efficient assembly code. That might seem like an odd comment, especially since my typical example used just 2-4 lines of code. But, these examples were coded as one would write pure assembly, which is not necessarily the way inline should be written. The sneaky assembler silently inserts some extra code into our programs.

Writing code for the inline assembly requires a paradigm shift. Starting with this tutorial, we’ll begin to cover the odd method of coding input and output operands for the asm statement. Using them will enable us to produce more efficient inline code.

Extending Inline

Recall the general form of an extended inline assembler statement is:

asm(“code” : output operand list : input operand list : clobber list);

This statement is divided by colons into (up to) four parts. While the code part is required, the others are optional:

  • Code: the assembler instructions, defined as a single string constant.
  • A list of output operands, separated by commas.
  • A list of input operands, separated by commas.
  • A list of “clobbered” or “accessed” registers.

We previous covered the code portion and the clobber list. We will continue to introduce new assembler instructions with each installment. But now, let’s discuss the input and output operands.

Constraints

Each input or output operand is described by a constraint string followed by a C expression in parentheses. Constraints are primarily letters but can be numbers too. The selection of the proper constraint depends on the range of registers or constants acceptable to the AVR instruction they are used with. Here’s an example:

"=r" (status) : "I" (_SFR_IO_ADDR(PINB)), "I" (PINB5)

Remember, the C compiler doesn’t check your assembly code. But the assembler does check the constraint against your C expression. If you specify the wrong constraints, the compiler may silently pass wrong code to the assembler, which would cause it to fail. And if this happens, everything abruptly terminates, giving a very cryptic error message.

For example, if you specify the constraint “r” and you are using this register with an “ORI” instruction in your code, the assembler may select any register. This would fail if the assembler selects r2 to r15. That’s why the correct constraint in that case is “d”. But, we’re getting ahead of our selves.

Constraining Constraints

The assembler is free to select any register for storage of the value that meets the constraints of your constraint. Interestingly, the assembler might not explicitly load or store your value, and it may even decide not to include your assembler code at all! All these decisions are part of the optimization strategy. For example, if you never use the variable in the remaining part of the C program, the compiler will most likely remove your code unless you switched off optimization.

Modified Constraints

A modifier sometimes precedes the constraint. Let’s demonstrate with an inline assembler routine using a C char variable (a), and an 8-bit constant value (ANSWER_TO_LIFE). Our inline program will simply save the constant to our variable.

Here is our output constraint string:

"=r" (a)

The equal sign is the modifier; ‘=’ means that this operand is written to by this instruction, and the previous value is discarded and replaced by new data. The ‘r’ is the constraint, and instructs the assembler to place our value into “any general register”.

The input constraint string is even simpler:

"M" (ANSWER_TO_LIFE)

The ‘M’ defines an 8-bit integer constant in the range of 0-255. Inside the parentheses we place our C MACRO defined value, “ANSWER_TO_LIFE”.

Remember, we use the colon character, ‘:’ to separate code, outputs, inputs and clobbers.

Percentage Zero

Take notice of the ‘%0’ and ‘%1’ characters in our inline code below. These represent the substitution locations for the operand values from the constraint strings. The constrained values are substituted into the code in the order they appear at the bottom of the inline routine, 0 first, then 1, 2, etc. The first value (a) as constrained, is substituted where ‘%0’ appears, and the number ‘42’ is substituted for ‘%1’. If there were additional constraints, the next one in line would be substituted for ‘%2’, and sequentially onward.

Here is our full code:

#define ANSWER_TO_LIFE 42

volatile uint8_t a;

void setup() {
  Serial.begin(9600);

  asm (
    "ldi %0, %1 \n"
    : "=r" (a) : "M" (ANSWER_TO_LIFE)
  );
  
  Serial.print("a = "); Serial.println(a);
}

void loop() { }

This is where it gets a little confusing. The assembler doesn’t simply replace the ‘%0’ with the value in parenthesis. For our example, that would result in an assembler instruction looking something like:

ldi a, ANSWER_TO_LIFE

That’s invalid syntax for the LDI instruction. Instead, it replaces ‘%0’ with the value as described in the constraint. That’s an important difference. As such, it creates a valid assembler instruction like this:

ldi r24, 42

And that works.

Furthermore, notice we haven’t included any code to store the output. Rather, we instructed the assembler to do this for us through the equal sign ‘=’, in our “modified constraint”. This may seem like an odd way of writing assembler code, but this is how we do it. It seems natural to want to add a line something like the following after the LDI command:

"sts %0, %1 \n"

You could. But it’s just not necessary.

Getting To Specifics

Output operands must be write-only and the C expression result must be an lvalue, which means that the operands must be valid on the left side of assignments. Note, that the compiler does not check if the operands are a reasonable type for the kind of operation used in the assembler instructions.

Input operands are read-only. Read-write operands are not supported in inline assembler code. But there is a solution to this and we cover it below under the heading of “Straight Ahead”.

When the compiler selects the registers to use which represent the input or output operands, it does not use any of the registers listed in the “clobbered” section. As a result, clobbered registers are available for any use in the assembler code.

Be forewarned, that accessing data from C programs without using input/output operands (such as by using global symbols directly from the assembler template) may not work as expected. Since the assembler does not parse the inline code, it has no visibility of any symbols it references. This may result in those symbols being discarded as unreferenced unless they are also listed as input, output operands. The moral of this story is, “USE INPUT AND OUTPUT OPERANDS.”

The Percentage of A and B

Here is another example of performing a simple swap between two 16-bit integers:

volatile int a = 0xa1a2;
volatile int b = 0xb1b2;

void setup() {
  Serial.begin(9600);

  asm (
    "mov %A0, %A3 \n"
    "mov %B0, %B3 \n"
    "mov %A1, %A2 \n"
    "mov %B1, %B2 \n"
    : "=r" (a), "=r" (b) : "r" (a), "r" (b)
  );

  Serial.print("a = "); Serial.println(a, HEX);
  Serial.print("b = "); Serial.println(b, HEX);
}

void loop() { }

First, notice the letters A and B, as in %A0 and %B0. They refer to two different 8-bit registers, each containing a part of the 2-byte value of %0. Recalling, the Arduino is a little-endian microcontroller, meaning the LSB is stored in the lower memory address and the MSB is stored in the higher address. Therefore, ‘A’ refers to the MSB, while ‘B’ addresses the LSB. If we were dealing with a 4-byte, 32-bit value, we would use the letters A through D.

Second, we address the two variables (a) and (b) separately as both input and output operands. When the compiler fixes up the operands to satisfy the constraints, it needs to know which operands are read by the instructions and which are written by it. Again, ‘=’ identifies an operand which is only written (‘+’ identifies an operand that is both read and written, and all other operands are assumed to be read only).

Nix Name Your Operands

Although I sometimes find this an additional layer of confusion placed on top of a topic already layered with confusion, operands can be given names. The name is pre-pended in brackets to the constraints in the operand list, and references to the named operand use the bracketed name instead of a number after the % sign. Thus, the above example on the “meaning of life” becomes something like this:

asm (
  "ldi %[varA], %[Answer] \n"
  : [varA] "=r" (a) : [Answer] "M" (ANSWER_TO_LIFE)
);

I will leave you to yourself to determine if this makes the inline code easier to understand. Take note, that throughout this tutorial series we never use this feature.

Move Over

Last, we introduce the MOV instruction. Did you guess that MOV is the mnemonic for MOVe? The MOV instruction makes a copy of one register into another. The source register is left unchanged.

Let’s examine the code the assembler produces for this example. We should take note that this code is not very efficient:

LDS R24, 0x0102 //a
LDS R25, 0x0103
LDS R18, 0x0100 //b
LDS R19, 0x0101
MOV R18, R18
MOV R19, R19
MOV R24, R24
MOV R25, R25
STS 0x0103, R19
STS 0x0102, R18
STS 0x0101, R25
STS 0x0100, R24

Straight Ahead

Let’s make it efficient. Using bytes instead of integers, here is the straight-forward inline method for performing a swap. Again, notice we define both input and output operands. But, for the input operators it is possible to use a single digit in the constraint string. Using a digit “n” tells the compiler to use the same register as the ‘n-th’ output operand (they start at zero).

Next, hopefully you noticed, in a sneaky fashion we switched the order of the inputs and outputs. Finally, you probably noticed that we don’t write any code at all. Because our constraints do it for us!

uint8_t a = 10;
uint8_t b = 20;

void setup() {
  Serial.begin(9600);

  asm (
    "" : "=r" (a), "=r" (b) : "0" (b), "1" (a) 
  );

  Serial.print("a = "); Serial.println(a);
  Serial.print("b = "); Serial.println(b);
}

void loop(void) { }
 

The same method works with integers:

volatile int a = 0xa1a2;
volatile int b = 0xb1b2;

void setup() {
  Serial.begin(9600);

  asm volatile(
    "" : "=r" (a), "=r" (b) : "0" (b), "1" (a) 
  );

  Serial.print("a = "); Serial.println(a, HEX);
  Serial.print("b = "); Serial.println(b, HEX);
}

void loop() { }

One More Clobber

There is a special type of clobber called “memory” which informs the compiler that the inline assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). This is a “clobber” that is easily missed, and I admit to omitting it often.

To ensure memory contains correct values, the compiler may need to flush specific registers pointing to memory before executing the inline code. Further, the compiler does not assume that any values read from memory before the inline code remain unchanged after that code (it reloads them as needed). Using the “memory” clobber effectively forms a read/write memory barrier for the compiler.

Wrap Up

We covered a lot of material about constraints in this post, and we’ve only just begun. The proper use of constraints is critical to writing correct and efficient inline assembly code. It took me hours of studying the constraint list to become proficient with them, and at times, I still get frustrated. But as we continue in this series, it will get easier and clearer. With practice comes proficiency.

AVR family Specific Constraints

constraints

The x register is r27:r26, the y register is r29:r28, and the z register is r31:r30

Modifier Characters

‘=’
Means that this operand is written to by this instruction: the previous value is discarded and replaced by new data.

‘+’
Means that this operand is both read and written by the instruction.
When the compiler fixes up the operands to satisfy the constraints, it needs to know which operands are read by the instruction and which are written by it. ‘=’ identifies an operand which is only written; ‘+’ identifies an operand that is both read and written; all other operands are assumed to only be read.
If you specify ‘=’ or ‘+’ in a constraint, you put it in the first character of the constraint string.

‘&’
Means (in a particular alternative) that this operand is an earlyclobber operand, which is written before the instruction is finished using the input operands. Therefore, this operand may not lie in a register that is read by the instruction or as part of any memory address.
‘&’ applies only to the alternative in which it is written. In constraints with multiple alternatives, sometimes one alternative requires ‘&’ while others do not.
A operand which is read by the instruction can be tied to an earlyclobber operand if its only use as an input occurs before the early result is written. Adding alternatives of this form often allows GCC to produce better code when only some of the read operands can be affected by the earlyclobber.
Furthermore, if the earlyclobber operand is also a read/write operand, then that operand is written only after it’s used.
‘&’ does not obviate the need to write ‘=’ or ‘+’. As earlyclobber operands are always written, a read-only earlyclobber operand is ill-formed and will be rejected by the compiler.

‘%’
Declares the instruction to be commutative for this operand and the following operand. This means that the compiler may interchange the two operands if that is the cheapest way to make all operands fit the constraints. ‘%’ applies to all alternatives and must appear as the first character in the constraint. Only read-only operands can use ‘%’.
GCC can only handle one commutative pair in an asm; if you use more, the compiler may fail. Note that you need not use the modifier if the two alternatives are strictly identical; this would only waste time in the reload pass.

Reference

AVR 8-bit Instruction Set
AVR-GCC Inline Assembler Cookbook
Extended Asm – Assembler Instructions with C Expression Operands
Simple Constraints
Machine Specific Constraints
Constraint Modifiers

Also available as a book, with greatly expanded coverage!

BookCover
[click on the image]

Posted in Uncategorized | Tagged , , , , , | 5 Comments

Arduino Inline Assembly Tutorial #3 (Clobbers)

clobbered

Clobbered

Guess what? Our previous tutorial example (Tutorial #2) has a problem. Here is the inline portion of that code:

asm (
  "ldi r26, 42  \n"
  "sts (a), r26 \n"
);

Notice in our example, we use register #26, or r26. Even though we only used this register temporarily, we have trashed (or “clobbered”) any value that was previous stored there. The compiler may have been using register r26 somewhere else in this program, and we’ve inadvertently replaced any value that may have been inside r26 with our value of 42. This may have introduced a bug into our program, or worse, it could have caused the program to crash.

Remember, the compiler simply passes our assembly code onto the avr-as assembler. It really has no idea what we are doing. Because of this we need a method to inform the compiler of the registers we use, hence the clobber list.

If you recall the general form of the extended inline assembler statement:

asm(“code” : output operand list : input operand list : clobber list);

The fourth part is a list of “clobbered” or “accessed” registers. The format for this is to simply list the registers we clobber inside quotations. Like so:

"r26"

Our inline code should have looked like this:

asm (
  "ldi r26, 42  \n" 
  "sts (a), r26 \n" 
  : : : "r26"
);

Don’t forget the clobber list is the fourth part of the asm statement, and we separate the parts with colons. If we clobbered additional registers, we would simply add them to the list, separating them with commas, like so:

"r16", “r17”, “r25”, “r26”

The Chicken or the Egg

Let’s introduce a minor addition to our previous inline tutorial program. Instead of dealing with an 8-bit byte value, lets use a 16-bit (2-byte) integer value. Obviously, an integer value requires two byte-sized memory locations to completely store itself. This introduces a conundrum, which byte comes first?

Lows, Highs and Endians

Endianness refers to the order of the bytes in computer memory. An integer may be represented in big-endian or little-endian format. The arduino uses little-endian, which means the least significant byte (LSB) is stored in a lower memory address while the most significant byte is stored at a higher memory address.

Before we get to our inline assembly program, here’s a diversionary program for the arduino which demonstrates endianness:

//program demonstrating arduino endianness [little endian]
char text[32];

void setup() {
  uint16_t n16 = 0x1234;     //declare & initialize 16-bit number
  uint32_t n32 = 0x12345678; //declare & initialize 32-bit number

  Serial.begin(9600);

  uint8_t* pn16 = (uint8_t *)&n16; //declare uint8_t pointer to 1st byte of 16-bit number
  
  Serial.println(n16, HEX);
  for (uint8_t i=0; i<2; i++) {
    //iterate through both bytes of n16, noting order of digits
    sprintf(text, "%p: %02x \n", pn16, (uint8_t)*pn16++); 
    Serial.print(text);
  }
  Serial.println();

  uint8_t* pn32 = (uint8_t *)&n32; //declare uint8_t pointer to 1st byte of 32-bit number

  Serial.println(n32, HEX);
  for (uint8_t i=0; i<4; i++) {
    //iterate through all 4-bytes of n32, noting order of digits
    sprintf(text, "%p: %02x \n", pn32, (uint8_t)*pn32++); 
    Serial.print(text);
  }

  Serial.println();

}

void loop(void) { }

It is helpful to use a couple of assembly operators which easily determine the LSB and MSB of a 16-bit integer:

  • lo8() Takes the least significant 8 bits of a 16-bit integer
  • hi8() Takes the most significant 8 bits of a 16-bit integer

Lucky for us, when using these operators, we don’t need to perform the math to determine that the LSB of 32,767 is 255 (0xff in hexadecimal), and the MSB is 127 (0x7f). The lo8 and hi8 operators do this for us. Armed with this new information let’s store the value of 32,767 into our integer variable (a):

16-bit Integer Example

volatile int a = 0;

void setup() {
  Serial.begin(9600);

  asm (
    "ldi r24, lo8(32767) \n" //0xff
    "ldi r25, hi8(32767) \n" //0x7f
    "sts (a), r24        \n" //lsb
    "sts (a + 1), r25    \n" //msb
    : : : "r24", "r25"
  );

  Serial.print("a = "); Serial.println(a);
}

void loop(void) { }

First, notice how we address the 2-byte memory location representing (a), by using the notation of (a + 1) for the MSB, while just (a) equates to the LSB. It’s vitally important to keep the correct order, or endianess, otherwise our number would have become 0xff7f in hexadecimal, which is 65,407 as an unsigned integer, or -127 as a signed integer. If all of this sounded foreign to you, you might want to study up on hexadecimal notation and signed vs. unsigned integers.

Second, we didn’t forget to include the “clobber” list this time.

Final Answer

Our final example is just a simple adaptation of our very first inline program. Here we are again dealing with byte values, and we are just going to perform a simple variable swap. In C, we would code this something like:

byte c, b=20, a=10;

c = a; 
a = b; 
b = c;

In inline assembler:

volatile byte a = 10;
volatile byte b = 20;

void setup() {
  Serial.begin(9600);

  asm (
    "lds r24, (a) \n"
    "lds r26, (b) \n"
    "sts (b), r24 \n" //exchange registers
    "sts (a), r26 \n"
    : : : "r24", "r26"
  );

  Serial.print("a = "); Serial.println(a);
  Serial.print("b = "); Serial.println(b);
}

void loop(void) { }

Notice, instead of loading an immediate value with the LDI instruction, we use LDS. LDS is the mnemonic for “Load Direct from data Space”. LDS loads one byte from the data space (SRAM) into a register.

In our program, in the process of loading and then storing, we simply exchange the registers (r24 for r26) in order to perform the swap. Notice we don’t need to burden ourselves with the actual addresses of the variables a and b. In both the LDS and STS instructions, the assembler inserts the SRAM memory addressing for us. Furthermore, we correctly identify the two registers used, as “Clobbered”.

Spoiler Alert

In our next tutorial, we will reduce the previous byte swap program into the following rather odd-looking inline assembler code. You might even be tempted to exclaim, “What code?” Believe it or not, this works. Get ready to travel the winding path of input and output operands!

asm (
  "" : "=r" (a), "=r" (b) : "0" (b), "1" (a) 
);

Reference

AVR 8-bit Instruction Set
AVR-GCC Inline Assembler Cookbook
Extended Asm – Assembler Instructions with C Expression Operands

Also available as a book, with greatly expanded coverage!

BookCover

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

Arduino Inline Assembly Tutorial #2 (Extending asm)

extended

The Extended asm Statement

The first tutorial in this series can be found here and covers a great amount of necessary background material.

The general form of an extended inline assembler statement is:

asm(“code” : output operand list : input operand list : clobber list);

This statement is divided by colons into (up to) four parts. While the code part is required, the others are optional:

  • Code: the assembler instructions, defined as a single string constant.
  • A list of output operands, separated by commas.
  • A list of input operands, separated by commas.
  • A list of “clobbered” or “accessed” registers.

For now, we are going to ignore parts 2 through 4 and concentrate on the code part of the statement.

Our First Inline Assembly Program

This is a basic example of storing a value in memory.

volatile byte a=0;

void setup() {
  Serial.begin(9600);

  asm (
    "ldi r26, 42  \n" //load register r26 with 42
    "sts (a), r26 \n" //store r26 into memory location of variable a
  );

  Serial.print("a = "); Serial.println(a);
}

void loop() { }

To fully understand what we are doing here first we need to cover some background.

Memory

The arduino has 3 basic types of memory, flash, SRAM and EEPROM. Flash is the memory where our program is stored. Arduino flash is non-volatile storage where data can be retrieved even after power has been cycled (turned off and then back on). When you upload an arduino program, it gets loaded into flash memory.

EEPROM is a form of non-volatile storage used for variable data that we want to maintain between operations of our program.

SRAM is the memory used to store variable information and data. SRAM is volatile storage, and anything placed here is immediately lost when power is removed. In our program above, variable a is stored in SRAM. For now, we are not concerned about EEPROM or Flash memory, and will concentrate on SRAM.

Registers Are Memory Too

Registers are special SRAM memory locations. The arduino has 32 general purpose registers (labeled r0 to r31). Each register is 8-bits, or 1-byte in size. r0 occupies SRAM position 0, with each register following incrementally through SRAM position 31. These 32 general purpose registers are important because the arduino cannot operate (change, do math, compare values, etc.) directly upon memory. Values stored in memory must first be loaded into a register(s).

As you can imagine, the arduino has more than 32 registers. Registers also have lots more specific details about them, but for now, these first 32 are more than enough.

Assembly Instructions

Our inline assembly consisted of just two instructions, LDI and STS:

  "ldi r26, 42  \n"
  "sts (a), r26 \n"

The LDI instruction is a mnemonic for “LoaD Immediate”. This instruction simply loads an 8 bit constant value directly into a register. The register must be between 16 and 31. For example, trying to use register #1, r1 with the LDI instruction would cause an error.

In our program, we load the value “42” into register #26, r26 (pedantically we take note, r26 is actually the 27th register, since numbering starts at zero). We could have chosen r18, r19 or even r24 for that matter. Later, our register selection will become crucial, but for now #26 seems like a good choice.

The LDI instruction is followed by an STS instruction. STS is a mnemonic for “STore direct to data Space”. STS stores one byte from a Register into SRAM, the data space. In our inline code, we place the contents of register #26, r26 into the memory location of variable a. Quietly, behind the scenes, the assembler replaces “a” with the memory location of a. Neat.

Our program finishes by printing the contents of variable location a through the Serial Terminal. Hopefully this produces the output: “a = 42”.

A Few Caveats About Our Program

The variable must be global in scope. This causes the compiler to locate the variable inside SRAM. If we declared the variable a inside of the setup() function, it may have been stored temporarily on the stack or inside a register. If that was the case, the STS instruction would have caused an error because it only works when storing to SRAM. The stack is a subject of a different tutorial.

Furthermore, since the default arduino compilation uses the –Os optimization level, we must declare it “volatile”, so the optimizer doesn’t eliminate it. The optimizer is very good at what it does, like when it sees code that’s not necessary, it will artfully remove it. In our case here, we don’t want this code to be removed.

More To Come

With these two beginning tutorials we have covered the fundamentals of the inline assembler, basic syntax and a few assembler instructions. We will continue looking at the extended inline assembler, introducing new instructions as we go. Next, we’ll look at the “clobber list” then dive into the bizarre world of input and output operands.

Reference

AVR 8-bit Instruction Set
AVR-GCC Inline Assembler Cookbook
Basic Asm — Assembler Instructions Without Operands
Extended Asm – Assembler Instructions with C Expression Operands

[updated: 4/8/16]

Also available as a book, with expanded coverage!

Book

Posted in Uncategorized | Tagged , , , , , | 2 Comments