Arduino Inline Assembly Tutorial #4 (Constraints)

constraint

Introduction

I have a confession to make. My previous examples were not very efficient assembly code. That might seem like an odd comment, especially since my typical example used just 2-4 lines of code. But, these examples were coded as one would write pure assembly, which is not necessarily the way inline should be written. The sneaky assembler silently inserts some extra code into our programs.

Writing code for the inline assembly requires a paradigm shift. Starting with this tutorial, we’ll begin to cover the odd method of coding input and output operands for the asm statement. Using them will enable us to produce more efficient inline code.

Extending Inline

Recall the general form of an extended inline assembler statement is:

asm(“code” : output operand list : input operand list : clobber list);

This statement is divided by colons into (up to) four parts. While the code part is required, the others are optional:

  • Code: the assembler instructions, defined as a single string constant.
  • A list of output operands, separated by commas.
  • A list of input operands, separated by commas.
  • A list of “clobbered” or “accessed” registers.

We previous covered the code portion and the clobber list. We will continue to introduce new assembler instructions with each installment. But now, let’s discuss the input and output operands.

Constraints

Each input or output operand is described by a constraint string followed by a C expression in parentheses. Constraints are primarily letters but can be numbers too. The selection of the proper constraint depends on the range of registers or constants acceptable to the AVR instruction they are used with. Here’s an example:

"=r" (status) : "I" (_SFR_IO_ADDR(PINB)), "I" (PINB5)

Remember, the C compiler doesn’t check your assembly code. But the assembler does check the constraint against your C expression. If you specify the wrong constraints, the compiler may silently pass wrong code to the assembler, which would cause it to fail. And if this happens, everything abruptly terminates, giving a very cryptic error message.

For example, if you specify the constraint “r” and you are using this register with an “ORI” instruction in your code, the assembler may select any register. This would fail if the assembler selects r2 to r15. That’s why the correct constraint in that case is “d”. But, we’re getting ahead of our selves.

Constraining Constraints

The assembler is free to select any register for storage of the value that meets the constraints of your constraint. Interestingly, the assembler might not explicitly load or store your value, and it may even decide not to include your assembler code at all! All these decisions are part of the optimization strategy. For example, if you never use the variable in the remaining part of the C program, the compiler will most likely remove your code unless you switched off optimization.

Modified Constraints

A modifier sometimes precedes the constraint. Let’s demonstrate with an inline assembler routine using a C char variable (a), and an 8-bit constant value (ANSWER_TO_LIFE). Our inline program will simply save the constant to our variable.

Here is our output constraint string:

"=r" (a)

The equal sign is the modifier; ‘=’ means that this operand is written to by this instruction, and the previous value is discarded and replaced by new data. The ‘r’ is the constraint, and instructs the assembler to place our value into “any general register”.

The input constraint string is even simpler:

"M" (ANSWER_TO_LIFE)

The ‘M’ defines an 8-bit integer constant in the range of 0-255. Inside the parentheses we place our C MACRO defined value, “ANSWER_TO_LIFE”.

Remember, we use the colon character, ‘:’ to separate code, outputs, inputs and clobbers.

Percentage Zero

Take notice of the ‘%0’ and ‘%1’ characters in our inline code below. These represent the substitution locations for the operand values from the constraint strings. The constrained values are substituted into the code in the order they appear at the bottom of the inline routine, 0 first, then 1, 2, etc. The first value (a) as constrained, is substituted where ‘%0’ appears, and the number ‘42’ is substituted for ‘%1’. If there were additional constraints, the next one in line would be substituted for ‘%2’, and sequentially onward.

Here is our full code:

#define ANSWER_TO_LIFE 42

volatile uint8_t a;

void setup() {
  Serial.begin(9600);

  asm (
    "ldi %0, %1 \n"
    : "=r" (a) : "M" (ANSWER_TO_LIFE)
  );
  
  Serial.print("a = "); Serial.println(a);
}

void loop() { }

This is where it gets a little confusing. The assembler doesn’t simply replace the ‘%0’ with the value in parenthesis. For our example, that would result in an assembler instruction looking something like:

ldi a, ANSWER_TO_LIFE

That’s invalid syntax for the LDI instruction. Instead, it replaces ‘%0’ with the value as described in the constraint. That’s an important difference. As such, it creates a valid assembler instruction like this:

ldi r24, 42

And that works.

Furthermore, notice we haven’t included any code to store the output. Rather, we instructed the assembler to do this for us through the equal sign ‘=’, in our “modified constraint”. This may seem like an odd way of writing assembler code, but this is how we do it. It seems natural to want to add a line something like the following after the LDI command:

"sts %0, %1 \n"

You could. But it’s just not necessary.

Getting To Specifics

Output operands must be write-only and the C expression result must be an lvalue, which means that the operands must be valid on the left side of assignments. Note, that the compiler does not check if the operands are a reasonable type for the kind of operation used in the assembler instructions.

Input operands are read-only. Read-write operands are not supported in inline assembler code. But there is a solution to this and we cover it below under the heading of “Straight Ahead”.

When the compiler selects the registers to use which represent the input or output operands, it does not use any of the registers listed in the “clobbered” section. As a result, clobbered registers are available for any use in the assembler code.

Be forewarned, that accessing data from C programs without using input/output operands (such as by using global symbols directly from the assembler template) may not work as expected. Since the assembler does not parse the inline code, it has no visibility of any symbols it references. This may result in those symbols being discarded as unreferenced unless they are also listed as input, output operands. The moral of this story is, “USE INPUT AND OUTPUT OPERANDS.”

The Percentage of A and B

Here is another example of performing a simple swap between two 16-bit integers:

volatile int a = 0xa1a2;
volatile int b = 0xb1b2;

void setup() {
  Serial.begin(9600);

  asm (
    "mov %A0, %A3 \n"
    "mov %B0, %B3 \n"
    "mov %A1, %A2 \n"
    "mov %B1, %B2 \n"
    : "=r" (a), "=r" (b) : "r" (a), "r" (b)
  );

  Serial.print("a = "); Serial.println(a, HEX);
  Serial.print("b = "); Serial.println(b, HEX);
}

void loop() { }

First, notice the letters A and B, as in %A0 and %B0. They refer to two different 8-bit registers, each containing a part of the 2-byte value of %0. Recalling, the Arduino is a little-endian microcontroller, meaning the LSB is stored in the lower memory address and the MSB is stored in the higher address. Therefore, ‘A’ refers to the MSB, while ‘B’ addresses the LSB. If we were dealing with a 4-byte, 32-bit value, we would use the letters A through D.

Second, we address the two variables (a) and (b) separately as both input and output operands. When the compiler fixes up the operands to satisfy the constraints, it needs to know which operands are read by the instructions and which are written by it. Again, ‘=’ identifies an operand which is only written (‘+’ identifies an operand that is both read and written, and all other operands are assumed to be read only).

Nix Name Your Operands

Although I sometimes find this an additional layer of confusion placed on top of a topic already layered with confusion, operands can be given names. The name is pre-pended in brackets to the constraints in the operand list, and references to the named operand use the bracketed name instead of a number after the % sign. Thus, the above example on the “meaning of life” becomes something like this:

asm (
  "ldi %[varA], %[Answer] \n"
  : [varA] "=r" (a) : [Answer] "M" (ANSWER_TO_LIFE)
);

I will leave you to yourself to determine if this makes the inline code easier to understand. Take note, that throughout this tutorial series we never use this feature.

Move Over

Last, we introduce the MOV instruction. Did you guess that MOV is the mnemonic for MOVe? The MOV instruction makes a copy of one register into another. The source register is left unchanged.

Let’s examine the code the assembler produces for this example. We should take note that this code is not very efficient:

LDS R24, 0x0102 //a
LDS R25, 0x0103
LDS R18, 0x0100 //b
LDS R19, 0x0101
MOV R18, R18
MOV R19, R19
MOV R24, R24
MOV R25, R25
STS 0x0103, R19
STS 0x0102, R18
STS 0x0101, R25
STS 0x0100, R24

Straight Ahead

Let’s make it efficient. Using bytes instead of integers, here is the straight-forward inline method for performing a swap. Again, notice we define both input and output operands. But, for the input operators it is possible to use a single digit in the constraint string. Using a digit “n” tells the compiler to use the same register as the ‘n-th’ output operand (they start at zero).

Next, hopefully you noticed, in a sneaky fashion we switched the order of the inputs and outputs. Finally, you probably noticed that we don’t write any code at all. Because our constraints do it for us!

uint8_t a = 10;
uint8_t b = 20;

void setup() {
  Serial.begin(9600);

  asm (
    "" : "=r" (a), "=r" (b) : "0" (b), "1" (a) 
  );

  Serial.print("a = "); Serial.println(a);
  Serial.print("b = "); Serial.println(b);
}

void loop(void) { }
 

The same method works with integers:

volatile int a = 0xa1a2;
volatile int b = 0xb1b2;

void setup() {
  Serial.begin(9600);

  asm volatile(
    "" : "=r" (a), "=r" (b) : "0" (b), "1" (a) 
  );

  Serial.print("a = "); Serial.println(a, HEX);
  Serial.print("b = "); Serial.println(b, HEX);
}

void loop() { }

One More Clobber

There is a special type of clobber called “memory” which informs the compiler that the inline assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). This is a “clobber” that is easily missed, and I admit to omitting it often.

To ensure memory contains correct values, the compiler may need to flush specific registers pointing to memory before executing the inline code. Further, the compiler does not assume that any values read from memory before the inline code remain unchanged after that code (it reloads them as needed). Using the “memory” clobber effectively forms a read/write memory barrier for the compiler.

Wrap Up

We covered a lot of material about constraints in this post, and we’ve only just begun. The proper use of constraints is critical to writing correct and efficient inline assembly code. It took me hours of studying the constraint list to become proficient with them, and at times, I still get frustrated. But as we continue in this series, it will get easier and clearer. With practice comes proficiency.

AVR family Specific Constraints

constraints

The x register is r27:r26, the y register is r29:r28, and the z register is r31:r30

Modifier Characters

‘=’
Means that this operand is written to by this instruction: the previous value is discarded and replaced by new data.

‘+’
Means that this operand is both read and written by the instruction.
When the compiler fixes up the operands to satisfy the constraints, it needs to know which operands are read by the instruction and which are written by it. ‘=’ identifies an operand which is only written; ‘+’ identifies an operand that is both read and written; all other operands are assumed to only be read.
If you specify ‘=’ or ‘+’ in a constraint, you put it in the first character of the constraint string.

‘&’
Means (in a particular alternative) that this operand is an earlyclobber operand, which is written before the instruction is finished using the input operands. Therefore, this operand may not lie in a register that is read by the instruction or as part of any memory address.
‘&’ applies only to the alternative in which it is written. In constraints with multiple alternatives, sometimes one alternative requires ‘&’ while others do not.
A operand which is read by the instruction can be tied to an earlyclobber operand if its only use as an input occurs before the early result is written. Adding alternatives of this form often allows GCC to produce better code when only some of the read operands can be affected by the earlyclobber.
Furthermore, if the earlyclobber operand is also a read/write operand, then that operand is written only after it’s used.
‘&’ does not obviate the need to write ‘=’ or ‘+’. As earlyclobber operands are always written, a read-only earlyclobber operand is ill-formed and will be rejected by the compiler.

‘%’
Declares the instruction to be commutative for this operand and the following operand. This means that the compiler may interchange the two operands if that is the cheapest way to make all operands fit the constraints. ‘%’ applies to all alternatives and must appear as the first character in the constraint. Only read-only operands can use ‘%’.
GCC can only handle one commutative pair in an asm; if you use more, the compiler may fail. Note that you need not use the modifier if the two alternatives are strictly identical; this would only waste time in the reload pass.

Reference

AVR 8-bit Instruction Set
AVR-GCC Inline Assembler Cookbook
Extended Asm – Assembler Instructions with C Expression Operands
Simple Constraints
Machine Specific Constraints
Constraint Modifiers

Also available as a book, with greatly expanded coverage!

BookCover
[click on the image]

About Jim Eli

µC experimenter
This entry was posted in Uncategorized and tagged , , , , , . Bookmark the permalink.

12 Responses to Arduino Inline Assembly Tutorial #4 (Constraints)

  1. hoda says:

    does avr have registers specific for floating point numbers? how are supposed to deal with that?

    • Jim Eli says:

      No. Floating point is implemented in software.

      • hoda says:

        Thank you for responding. I am also having another problem. I’m having a hard time implementing mult with these methods. could you perhaps provide an example for that?

      • Jim Eli says:

        hoda,
        That’s not surprising! Multiplication and division on 8-bit AVR is difficult. Look for ATMEL’s AVR200 Application Note (and software) which lists subroutines for multiplication and division of 8 and 16-bit signed and unsigned numbers. Also, study the GCC AVR source code for the built-in function __mulsi3. These should get you started. 32-bit routines are about the limit for these processors. You can find reliable 64-bit routines floating around the internet, but they start to becomes cumbersome and slow. My advice is if you need to do big multiplies/divides, you’re better off using a different chip, or possibly try incorporating something like an uM-FPU64, 64-bit Floating Point Coprocessor. Good luck.

  2. hoda says:

    Thank you. I actually got it to work finally!

  3. David Buezas says:

    Great content, I bought your book 🙂
    One small missing pice is the fact that when constraint “e” (any pointer register pair) is used, then the correct operand is %aN (where N is a number or a label) e.g.:
    void toggle() {
    asm (“st %a0, %1\n\t” : : “e”(port), “r”(mask));
    }

  4. Wiel Schrijen says:

    After reading a lot of confusing assembler stuff I found your great tutorial. Thank you very much!
    But I have a problem with passing a single parameter from a function call from loop() to an
    assembly program what I need to transmit very fast bytes to an SPI DAC (MAX550).
    I got this error message:
    ================================================================================
    Compiling ‘FastestSPI’ for ‘ATmega328P (Old Bootloader) (Arduino Nano)’
    ccUM3aVB.ltrans0.ltrans.o * : In function fastSpiTransmit1
    Error linking for board ATmega328P(Old Bootloader) (Arduino Nano)
    FastestSPI.ino : 52 : undefined reference to r28
    Build failed for project ‘FastestSPI’
    collect2.exe * : error : ld returned 1 exit status
    ================================================================================
    After many hours of trying to find the error, I want to ask you to see what I’m doing wrong.
    Below is my program FastSPI.ino .

    Thanks in advance!!

    /* WS 2 March 2021
    * Arduino Inline Assembly Tutorial:

    Arduino Inline Assembly Tutorial #4 (Constraints)


    *
    Based on FastestSPI (c)2015 Josh Levine [http://josh.com]
    * Demo showing a technique to improve the maximum SPI thoughput
    * by cycle counting and blindly sending new data into the SPI hardware
    * without checking status bit.
    * Full article: http://wp.josh.com/2015/09/29/bare-metal-fast-spi-on-avr/
    */

    // Faster macros then digitalWrite()– > macro ca. 125nsec versus digitalWrite() ca. 4usec !!!
    // From https://www.best-microcontroller-projects.com/arduino-digitalwrite.html
    #define setPin(b) ( (b)<8 ? PORTD |=(1<<(b)) : PORTB |=(1<<(b-8)) )
    #define clrPin(b) ( (b)<8 ? PORTD &=~(1<<(b)) : PORTB &=~(1<<(b-8)) )
    #define DAC_CHIPSELECT 10 // chip select DAC, active low

    #include

    inline static void fastSpiTransmit1(uint8_t ToSend)
    {
    asm volatile
    ( // asm(“code” : output operand list : input operand list : clobber list);
    // Cycles
    // ======
    “ldi r22, %0 \n\t” // 2 – load 1e byte

    “out %[spdr],r22 \n\t” // 1 – transmit byte
    “rjmp .+0 \n\t” // 2
    “rjmp .+0 \n\t” // 2
    “rjmp .+0 \n\t” // 2
    “rjmp .+0 \n\t” // 2
    “rjmp .+0 \n\t” // 2
    “rjmp .+0 \n\t” // 2
    “nop \n\t” // 1
    “rjmp .+0 \n\t” // 2
    “rjmp .+0 \n\t” // 2

    : // Outputs ————–>>

    : // Inputs <<—————-
    "r" (ToSend),

    [spdr] "I" (_SFR_IO_ADDR(SPDR)) // SPI data register
    : // Clobbers
    "cc" // special name that indicates that flags (cc: condition code??) may have been clobbered
    );
    }

    void setup() {
    SPI.setClockDivider(SPI_CLOCK_DIV2); // Fastest possible speed
    SPI.begin();
    }

    void loop() {
    clrPin(DAC_CHIPSELECT); // Assembler translation is cbi 0x05, 2 ; 5
    fastSpiTransmit1(0x12);
    setPin(DAC_CHIPSELECT); // Assembler translation is sbi 0x05, 2 ; 5
    }

    • Jim Eli says:

      You need to think outside the box with gcc inline. Try something like this, however, I’m not sure the tight timing you’re aiming for is still correct.

      inline static void fastSpiTransmit1(uint8_t ToSend) {
      asm volatile (
      “out %1, %0 \n\t” // 1 – transmit byte
      “rjmp .+0 \n\t” // 2
      “rjmp .+0 \n\t” // 2
      “rjmp .+0 \n\t” // 2
      “rjmp .+0 \n\t” // 2
      “rjmp .+0 \n\t” // 2
      “rjmp .+0 \n\t” // 2
      “nop \n\t” // 1
      “rjmp .+0 \n\t” // 2
      “rjmp .+0 \n\t” // 2
      : : “r” (ToSend), “I” (_SFR_IO_ADDR(SPDR)) : “cc”
      );
      }

  5. Wiel Schrijen says:

    Hallo Jim,
    Thank you very much for your very quick response!
    It is indeed thinking outside the box.
    You said something about the input parameter earlier. I then started rewriting the code and you were right (of course).
    Now my code is exactly your (updated) solution!

    The idea about the tight timing is from Josh Levine http://wp.josh.com/2015/09/29/bare-metal-fast-spi-on-avr/.
    I can even remove three rjmp’s. The decode SPI info still looks fine on my scope … but I am now going to test with the MAX550 whether it also understands it ;).
    MAX550 requires CS low followed by a command byte followed by a data byte and CS high, so I have to send two bytes for one DAC update.
    The transfer time for 2 bytes (2 function calls) is now 2.76usec. With two times SPI.transfer() this time is 2.87usec. That’s only a little gain.
    With your nice tutorial I have now a function for passing an integer (command+data) so
    I have one function call and the transfer time is 2.25usec, a nice 0.62usec gain.

    Again, many thanks I have learned a lot of you!!

  6. Wiel S. says:

    Hello Jim,
    The version with the MAX550 works!
    If you are interested, I would like to send the code to you.
    I now want to change the code, a lot of time is lost retrieving the integers from the data array. I am now trying to write this part which is written in C, also in assembly … again another challenge.

Leave a comment