Arduino Blink Using GCC Inline Assembly

At less than 490 bytes, this version is less than half the size of the Arduino example program. Obviously, it could be made smaller.

#define CLOCK_MHZ       16UL
#define DELAY_LENGTH_MS 1000UL
#define DELAY_VALUE     (uint32_t)((CLOCK_MHZ * 1000UL * DELAY_LENGTH_MS) / 5UL)
 
void setup() {
  asm volatile (
    "sbi %0, %1 \n" //pinMode(13, OUTPUT);
    : : "I" (_SFR_IO_ADDR(DDRB)), "I" (DDB5)
  );
}
 
void loop() {
  asm volatile (
    "mov r18, %D2 \n" //save for second delay iteration
    "mov r20, %C2 \n"
    "mov r21, %B2 \n"
 
    "sbi %0, %1   \n" //turn LED on
 
  "1:             \n" //delay ~1 second
    "subi %A2, 1  \n"
    "sbci %B2, 0  \n"
    "sbci %C2, 0  \n"
    "brcc 1b      \n"
   
    "cbi %0, %1   \n" //turn LED off
   
  "2:             \n" //delay ~1 second
    "subi r18, 1  \n"
    "sbci r19, 0  \n"
    "sbci r20, 0  \n"
    "brcc 2b      \n"
   
    : : "I" (_SFR_IO_ADDR(PORTB)), "I" (PORTB5), "r" (DELAY_VALUE) : "r18", "r19", "r20"
  );
}

Note: Updated 3.16.2016 to conform with Arduino Inline Assembly Tutorials.

Also available as a book, with greatly expanded coverage:

BookCover

About Jim Eli

µC experimenter
This entry was posted in Uncategorized and tagged , , , , , , . Bookmark the permalink.

18 Responses to Arduino Blink Using GCC Inline Assembly

  1. Arip says:

    how about using pure C?? It’s just 214 bytes… here is.. 😀

    #include
    #include

    int main (void)
    {
    unsigned char counter;
    /* set PORTB for output*/
    DDRB = 0xFF;

    while (1)
    {
    /* set PORTB.2 high */
    PORTB = 0xFF;

    /* wait (10 * 120000) cycles = wait 1200000 cycles */
    counter = 0;
    while (counter != 50)
    {
    /* wait (30000 x 4) cycles = wait 120000 cycles */
    _delay_loop_2(30000);
    counter++;
    }

    /* set PORTB.2 low */
    PORTB = 0x00;

    /* wait (10 * 120000) cycles = wait 1200000 cycles */
    counter = 0;
    while (counter != 50)
    {
    /* wait (30000 x 4) cycles = wait 120000 cycles */
    _delay_loop_2(30000);
    counter++;
    }
    }

    return 1;
    }

    • Jim Eli says:

      Nice, however, you somewhat cheated. Your code is not running inside the Arduino IDE. If placed inside the IDE, your code compiles to 508 vs. 520 bytes for my version. However, if my code is compiled with the AVRStudio it is only 180 bytes. But again, this could be condensed even further. Thanks for the code.

      #include

      int main(void) {
      asm volatile (
      “sbi %0, %1 \n\t” //pinMode(13, OUTPUT);
      :: “I” (_SFR_IO_ADDR(DDRB)), “I” (DDB5)
      );

      asm volatile (
      “4: \n\t” //loop
      “sbi %0, %1 \n\t” //LED on
      “call OneSecondDelay \n\t” //delay
      “cbi %0, %1 \n\t” //LED off
      “call OneSecondDelay \n\t” //delay
      “rjmp 4b \n\t” //exit

      “OneSecondDelay: \n\t”
      “ldi r18, 0 \n\t” //delay 1 second
      “ldi r20, 0 \n\t”
      “ldi r21, 0 \n\t”

      “1: ldi r24, lo8(400) \n\t”
      “ldi r25, hi8(400) \n\t”
      “2: sbiw r24, 1 \n\t” //10x around this loop = 1ms
      “brne 2b \n\t”
      “inc r18 \n\t”
      “cpi r18, 10 \n\t”
      “brne 1b \n\t”

      “subi r20, 0xff \n\t” //1000 x 1ms = 1 second
      “sbci r21, 0xff \n\t”
      “ldi r24, hi8(1000) \n\t”
      “cpi r20, lo8(1000) \n\t”
      “cpc r21, r24 \n\t”
      “breq 3f \n\t”

      “ldi r18, 0 \n\t”
      “rjmp 1b \n\t”

      “3: \n\t”
      “ret \n\t”

      :: “I” (_SFR_IO_ADDR(PORTB)), “I” (PORTB5)
      : “r18”, “r20”, “r21”, “r24”, “r25”
      );

      return 1;

      }

  2. Bill Rees says:

    Really useful example…Thanks Jim

  3. nicohood says:

    Reblogged this on NicoHood.

  4. lb says:

    int led=12;

    void setup()
    {
    pinMode(led,OUTPUT);

    }
    void loop()
    {
    digitalWrite(led,HIGH);
    delay(1000);
    digitalWrite(led,LOW);
    delay(1000);
    }
    //11 bytes

    • Jim Eli says:

      ???
      Sketch uses 1,068 bytes (7%) of program storage space. Maximum is 14,336 bytes.
      Global variables use 11 bytes (1%) of dynamic memory, leaving 1,013 bytes for local variables. Maximum is 1,024 bytes.

      • greez says:

        try this:

        /* LED on pin 13 */
        /* IDE pin 13 on a port D bit 5 */

        int main () {

        cli();
        TCNT1 = 0;
        TCCR1A = 0;
        TCCR1B = 0;
        OCR1A = 31250; // 16MHz/256/2Hz
        TCCR1B |= _BV(WGM12); // CTC mode
        TCCR1B |= _BV(CS12); // clock_io/256
        TIMSK1 |= _BV(OCIE1A); // enable interrupt
        SMCR = _BV(SE); // sleep mode IDLE
        DDRB |= _BV(DDB5); // LED pin mode OUTPUT
        sei();

        while (true)
        __asm__ (“sleep” ::);

        return 0;
        }

        ISR (TIMER1_COMPA_vect, ISR_NAKED) {
        /* toggle LED pin */
        __asm__ (“sbi %0, %1 \n\t”
        “reti \n\t”
        :: “I” (_SFR_IO_ADDR(PINB)), “I” (PINB5) );
        }

      • Jim Eli says:

        I compile your code into 208 bytes. I like it, and a very good job of thinking outside the box.

  5. Mont pierce says:

    just curious as to why you’re embedding “\n\t” in the asm strings? The \n makes sense as that represents the newline character. The \t though represents the tab character.

    Usually, \n represents the end of line in non-msdos world, where \r\n represents end of line in msdos/windows world.

    Just curious. Thanks. 🙂

    • Jim Eli says:

      Mont,

      From the AVR-GCC Inline Assembler Cookbook:

      The linefeed and tab characters will make the assembler listing generated by the compiler more readable. It may look a bit odd for the first time, but that’s the way the compiler creates it’s own assembler code.

      Inside the .s file (produced by the compiler) use of the trailing tab formats the code properly. Here’s an example (with and without the use of tabs):

        asm volatile (
          "L1:            \n\t"
          "  mov %A0, %A2 \n"
          "  mov %B0, %B2 \n"
          "L_dl2%=:       \n"
          "  sbiw %A0, 1  \n\t"
          "  brne L_dl2%= \n\t"
          "  dec %1       \n\t"
          "  brne L1      \n\t"
          : "=&w" (cnt) : "r" (ms), "r" (delay_count)
        );
      

      Notice the funky output when the tab escape sequence is ignored:

      	
      	L1:       
      	  mov r30, r18 
        mov r31, r19 
      L_dl29:       
        sbiw r30, 1  
      	  brne L_dl29 
      	  dec r24       
      	  brne L1
      
  6. Johnny Quest says:

    Or 32 bytes in pure assembly:

    #define CLOCK_MHZ 16 ;clock speed in MHz
    .equ DELAY_LENGTH_MS = 1000
    .equ DELAY_VALUE = ((CLOCK_MHZ * 1000 * DELAY_LENGTH_MS) / 5)

    .nolist
    .include “m328Pdef.inc”
    .list

    .def temp = r16
    .def Dly0 = r17
    .def Dly1 = r18
    .def Dly2 = r19

    ;============================================
    .org 0
    rjmp RESET ;jump to reset

    ;============================================
    .org 0x40 ;place code behind IRQ vector table
    RESET:
    ldi temp,0b00100000
    out DDRB,temp

    LoopStart:
    sbi PORTB,PINB5 ;turn on LED
    rcall Delay1S ;delay 1 second
    cbi PORTB,PINB5 ;turn off LED
    rcall Delay1S ;delay 1 second
    rjmp LoopStart ;loop forever

    Delay1S:
    ldi Dly0,LOW(DELAY_VALUE)
    ldi Dly1,HIGH(DELAY_VALUE)
    ldi Dly2,BYTE3(DELAY_VALUE)

    Delay1Sa:
    subi DLY0,1
    sbci DLY1,0
    sbci DLY2,0
    brcc Delay1Sa ;loop till zero
    ret ;return to caller

    • greez says:

      not quite, actually ‘.org 0x40’ + 15 instructions = 94 bytes.
      btw why 64-bytes offset? metric system? on ATmega328 interrupts table has 26 vectors, it is 52 bytes.
      anyway… subroutine calls without stack pointer initialization, is it work?!
      append 8 bytes for this: SPL = low(RAMEND), SPH = high(RAMEND) and you get 102 bytes long code, or 90 without gap between interrupts table and code.
      of course in this program we can’t do anything than led blinking (eventually, who knows what we confess ))), so why do we need this interrupts table at all? and in case of stack pointer initialization lack, and in order to reduce code size – subroutines is luxury :))
      so it is my offer:

      .equ DELAY_VALUE = (16 * 1000 * 1000 / 5)
      .org 0x0000
      cli ; no interrupts
      sbi DDRB, DDB5 ; one output
      loop:
      sbi PINB, PINB5 ; toggle pin here
      ; ldi r24, LOW(DELAY_VALUE) ; redundant,
      ldi r25, HIGH(DELAY_VALUE) -1 ; we just subtract low byte from here
      ldi r16, BYTE3(DELAY_VALUE)
      sbiw r24, 1 ; hell yeah! 16bit ftw!!!
      sbci r16, 0
      brcc .-6
      rjmp loop

      and if we don’t need precision, we can completely omit counter initialization

      .org 0x0000
      cli
      sbi DDRB, DDB5
      sbi PINB, PINB5
      sbiw r24, 10 ; 10 for about 525 ms
      sbci r16, 0
      brcc .-6
      rjmp .-10

      7 instructions (14 bytes) total.
      and it is possible to do this code 2 bytes smaller using internal 128kHz oscillator as clock source instead of default 8MHz (16MHz external on arduino board), but arduino IDE doesn’t offer this feature (or i don’t know where The Button)

      regards 🙂

      • Eduardo says:

        Hi Greez,
        You seem to know your stuff on arduino assembly, could you recommend a good resource to start with?, I did program in assembly a long time ago and looking into getting back on it, but this time on micro controllers;
        I use MacOS mostly and Arduino but would be open to suggestions,
        thanks

        Ed

      • Jim Eli says:

        Ed,

        While not specifically covering the Arduino family of AVR microcontrollers, it’s still one of the best places to start: Programming Microcontrollers using Assembly Language

        Of course, if you want to use inline assembly language, or something more appropriate to the Arduino family, I must plug my epub: Arduino Inline Assembly.

  7. #define CLOCK_MHZ 16UL
    #define DELAY_LENGTH_MS 1000UL
    #define DELAY_VALUE (uint32_t)((CLOCK_MHZ * 1000UL * DELAY_LENGTH_MS) / 5UL)

    int main(void)
    {

    asm volatile (
    “sbi %0, %1 \n” //pinMode(13, OUTPUT);
    : : “I” (_SFR_IO_ADDR(DDRB)), “I” (DDB5)
    );

    start:

    asm volatile (
    “mov r18, %D2 \n” //save for second delay iteration
    “mov r20, %C2 \n”
    “mov r21, %B2 \n”

    “sbi %0, %1 \n” //turn LED on

    “1: \n” //delay ~1 second
    “subi %A2, 1 \n”
    “sbci %B2, 0 \n”
    “sbci %C2, 0 \n”
    “brcc 1b \n”

    “cbi %0, %1 \n” //turn LED off

    “2: \n” //delay ~1 second
    “subi r18, 1 \n”
    “sbci r19, 0 \n”
    “sbci r20, 0 \n”
    “brcc 2b \n”

    : : “I” (_SFR_IO_ADDR(PORTB)), “I” (PORTB5), “r” (DELAY_VALUE) : “r18”, “r19”, “r20”
    );

    goto start;

    return 0;
    }

    No need to use setup(), loop(). IDE is cheating a bit and it is adding main() when you start to compile.
    https://pl.wikipedia.org/wiki/Arduino

  8. Jiri Bruna says:

    Grzegorz, good ideas, but many errors. some registers order is reversed, the logic of initial values restoring is bad. I had to make a lot of changes. But finally success! Thanks anyway for inspiration.
    Jirka.

  9. Bonhomme says:

    alltrouh I understand the inline asm code, what is the meening of this line ?

    : : “I” (_SFR_IO_ADDR(DDRB)), “I” (DDB5)

    and this one ? ( I guess it returns the values r18,r19,r20 somehow …)
    : : “I” (_SFR_IO_ADDR(PORTB)), “I” (PORTB5), “r” (DELAY_VALUE) : “r18”, “r19”, “r20”

Leave a comment