Blynking an IoT Yunshan ESP8266 250V 10A AC/DC WIFI Network Relay Module


I purchased a few of these Yunshan Wifi Relays through ebay for approximately $7.50US. The device should be perfect for use in simple IOT projects which require controlling household AC power. The onboard JQC-3FF relay is rated to 250VAC or 30VDC at up to 12A. There are routered slots between the high voltage PCB traces for circuit isolation and arc-over protection. Transient voltage suppression is incorporated on both the board power supply and the photocoupler (see description below) input line.

The device requires a power supply between 7 and 30V DC. I unsuccessfully attempted to run it with an inexpensive 5V, 2A wall-wort, even though the onboard MP2303 buck converter is rated down to 4.8V. I did get it to operate successfully using a 9VDC wall-wort.

The device contains an integrated ESP8266-12E, but appears to only use the GPIO4 and GPIO5 pins. That was a disheartening discovery because it discards a significant amount of functionality inside the ESP8266 WIFI module. Hence the ESP8266 low power, wake from sleep provisions (where GPIO16 and RESET need to be linked together) would require some skillful soldering of the module’s exposed pins.

The good news is, programming the module is very easy, as I discuss later. I also found the overall build quality of my device to be above the typical level found on ebay-sourced Chinese electronics.

The ebay listing contained a link to a zip file, entitled U4648-datasheet, which contained example programs, schematics, and a Chinese manual. Through the Google translation service I managed to translate the manual, but there’s no reason to do that, as there isn’t much there. More information can be learned from a quick study of the schematic and the board itself.

Module Description
The Chinese manual presents the following limited module description:


1 – The power input terminals.
2 – The relay output terminals.
3 – IO input terminal.
4 – Enter the status indicator, IO input high when lit, blue light.
5 – The relay output status indicator, the relay is turned on, the red light.
6 – TTL serial output.
7 – Boot mode selection jumper.

Board Connectors

Here are the connections on my board:


A: 7-30V+ DC power supply
B: Power supply ground
C: Normally closed (NC) relay contact
D: Common (COM) relay contact
E: Normally open (NO) relay contact
F: 5V+ out
G: ESP8266 GPIO5 Optocoupler Input
H: Ground (isolated optocoupler input)

AP MODE Webpage

I was easily able to connect a 9V power supply to the A-B connector (see above picture and connector description) and control the device via WIFI. To do this, simply connect your computer or phone to the yunshan_wifi_xx_xx_xx network (where it appears the xx are hexadecimal numbers pulled from the ESP8266 MAC address). My device responded to the supplied password of yunshan123456789. Once a connection was established, I simply entered the IP address of into my browser. Once there, I was greeted by a Chinese web page, the translation of which appears below. From this webpage, I was able to open and close the relay. The status of the GPIO5 optocoupler input is also displayed on this webpage.


Since I have big IOT home automation plans for these devices, my next task was to attempt a re-program of the onboard ESP8266 module. For a quick test, I uploaded the traditional Arduino IDE ESP8266 blink program, and was rewarded with a 1Hz blinking blue LED on the ESP8266 module.

Program Upload

On the lower left portion of the PCB is a section that grants access to the ESP8266 pins for programming (see the above photo). These same pins are also useful for TTL serial output purposes (debugging, etc.). Separate 2 and 3-pin headers will need to be soldered into these connector holes (labeled P5 and P6). The ESP8266 GPIO4 controls the relay through a 2N3904 transistor. Setting GPIO4 high, causes the relay to close the NO contact with Common and the NC contact to open. Additionally, taking connector “G” high causes GPIO5 to also go low isolated via a PC817 photocoupler. On my board the blue LED is connected to GPIO2, and can be illuminated by pulling the pin low.

To program the ESP8266 module, I connected the TX, RX and ground pins of connector P6 to a SparkFun USB FTDI programmer, and jumped the two pins of connector P5 together when I was ready to upload. Connector P5 grounds GPIO0 and GPIO15, sending the device into bootloader mode. If you have trouble programming the ESP8266 like I did on the first attempt, ensure you also ground your FTDI device through the P6 connector.

A very good introduction to the ESP8266 module can be found here. Excellent programming information for the individual ESP8266 modules is also widely available (two examples: ESP8266-01 and ESP8266-12e).

Board Schematic


Blynk Relay Control Application

 * Title: Simple ESP-8266 blynk/yunshan wifi relay control
 * File: esp8266_yunshan_relay.ino
 * Author: James Eli
 * Date: 12/25/2016
 * This program controls a Yunshan wifi relay module communicating through 
 * the onboard esp-8266-12e module. The module is controlled from the
 * internet via the Blynk cloud app. 
 * Notes:
 *  (1) Requires the following arduino libraries:
 *      ESP8266
 *      Blynk
 *  (2) Compiled with arduino ide 1.6.12
 *  (3) Uses three Blynk app widgets:
 *       V0: button configured as a switch.
 *       V1: led.
 *       V2: led.
 * Change Log:
 *   12/25/2016: Initial release. JME
 *   12/31/2016: Added input pin status. JME
 *   01/15/2017: Added volatile. JME
#include <ESP8266WiFi.h>
#include <BlynkSimpleEsp8266.h>

// Esp8266 pins.
#define ESP8266_GPIO2    2 // Blue LED.
#define ESP8266_GPIO4    4 // Relay control.
#define ESP8266_GPIO5    5 // Optocoupler input.
#define LED_PIN          ESP8266_GPIO2
// Blynk app authentication code.
char auth[] = "***";
// Wifi SSID.
const char ssid[] = "***";
// Wifi password.
const char password[] = "***";    
// Flag for sync on re-connection.
bool isFirstConnect = true; 
volatile int relayState = LOW;    // Blynk app pushbutton status.
volatile int inputState = LOW;    // Input pin state.

void setup() {
  pinMode( ESP8266_GPIO4, OUTPUT );       // Relay control pin.
  pinMode( ESP8266_GPIO5, INPUT_PULLUP ); // Input pin.
  pinMode( LED_PIN, OUTPUT );             // ESP8266 module blue LED.
  digitalWrite( LED_PIN, LOW );           // Turn on LED.
  Blynk.begin( auth, ssid, password );    // Initiate Blynk conection.
  digitalWrite( LED_PIN, HIGH );          // Turn off LED.

// This function runs every time Blynk connection is established.
  if ( isFirstConnect ) {
    isFirstConnect = false;

// Sync input LED.

// Blynk app relay command.
  if ( param.asInt() != relayState ) {
    relayState = !relayState;                  // Toggle state.
    digitalWrite( ESP8266_GPIO4, relayState ); // Relay control pin.
    Blynk.virtualWrite( V1, relayState*255 );  // Set Blynk app LED.

// Debounce input pin.
int DebouncePin( void ) {
  // Read input pin.
  if ( digitalRead( ESP8266_GPIO5 ) == HIGH ) {
    // Debounce input.
    delay( 25 );
    if ( digitalRead( ESP8266_GPIO5 ) == HIGH )
      return HIGH;
  return LOW;

// Set LED based upon state of input pin.
void CheckInput( void ) {
  if ( DebouncePin() != inputState ) {
    Blynk.virtualWrite( V2, inputState*255 );
    inputState = !inputState;

// Main program loop.
void loop() {;
  //yield(); //Updated: 3/8/2017

TCP Client Demo

Here is a basic server which responds to TCP client HTTP GET commands (added 1/8/17):

#include <ESP8266WiFi.h>

// Esp8266 pinouts
#define ESP8266_GPIO2    2  // Blue LED.
#define ESP8266_GPIO4    4  // Relay control. 
#define ESP8266_GPIO5    5  // Optocoupler input.
#define LED_PIN          ESP8266_GPIO2
// WiFi Definitions.
const char ssid[] = "***";
const char pswd[] = "***";
WiFiServer server( 80 );
volatile int relayState = 0;      // Relay state.

void setup() {

void GetClient( WiFiClient client ) {
  // Read the first line of the request.
  String req = client.readStringUntil( '\r' );
  Serial.println( req );

  String s = "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\n\r\n<!DOCTYPE HTML>\r\n<html>\r\n";

  if ( req.indexOf( "OPTIONS" ) != -1 ) {
    s += "Allows: GET, OPTIONS";

  } else if ( req.indexOf( "GET" ) != -1 ) {
    if ( req.indexOf( "open" ) != -1 ) {
      // relay on!
      s += "relay on!";
      relayState = 1;
      digitalWrite( ESP8266_GPIO4, 1 ); // Relay control pin.
    } else if ( req.indexOf( "close" ) != -1 ) {
      // relay off!
      s += "relay off!";
      relayState = 0;
      digitalWrite( ESP8266_GPIO4, 0 ); // Relay control pin.
    } else if ( req.indexOf( "relay" ) != -1 ) {
      if ( relayState == 0 )
        // relay off!
        s += "relay off!";
        // relay on!
        s += "relay on!";

    } else if ( req.indexOf( "io" ) != -1 ) {
      if ( digitalRead( ESP8266_GPIO5 ) == 0 )
        s += "input io is:0!";
        s += "input io is:1!";
    } else if ( req.indexOf( "MAC" ) != -1 ) {
      uint8_t mac[WL_MAC_ADDR_LENGTH];
      WiFi.softAPmacAddress( mac );
      String macID = String( mac[WL_MAC_ADDR_LENGTH - 5], HEX) + String( mac[WL_MAC_ADDR_LENGTH - 4], HEX) +
                     String( mac[WL_MAC_ADDR_LENGTH - 3], HEX) + String( mac[WL_MAC_ADDR_LENGTH - 2], HEX) +
                     String( mac[WL_MAC_ADDR_LENGTH - 1], HEX) + String( mac[WL_MAC_ADDR_LENGTH], HEX);
      s += "MAC address: " + macID;

    } else
      s += "Invalid Request.<br> Try: open/close/relay/io/MAC";

  } else 
    s = "HTTP/1.1 501 Not Implemented\r\nContent-Type: text/html\r\n\r\n<!DOCTYPE HTML>\r\n<html>\r\n";
  s += "</html>\n";

  // Send the response to the client.
  client.print( s );
  delay( 1 );
  Serial.println( "Client response sent." );

void loop() {
  // Check if a client has connected.
  WiFiClient client = server.available();
  if ( client ) 
    GetClient( client );

void connectWiFi() {
  byte ledStatus = LOW;
  Serial.println( "Connecting to: " + String( ssid ) );
  // Set WiFi mode to station (as opposed to AP or AP_STA).
  WiFi.mode( WIFI_STA );

  // WiFI.begin([ssid], [passkey]) initiates a WiFI connection.
  // to the stated [ssid], using the [passkey] as a WPA, WPA2, or WEP passphrase.
  WiFi.begin( ssid, pswd );

  while ( WiFi.status() != WL_CONNECTED ) {
    // Blink the LED.
    digitalWrite( LED_PIN, ledStatus ); // Write LED high/low.
    ledStatus = ( ledStatus == HIGH ) ? LOW : HIGH;
    delay( 100 );

  Serial.println( "WiFi connected" );  
  Serial.println( "IP address: " );
  Serial.println( WiFi.localIP() );

void initHardware() {
  Serial.begin( 9600 );
  pinMode( ESP8266_GPIO4, OUTPUT );       // Relay control pin.
  pinMode( ESP8266_GPIO5, INPUT_PULLUP ); // Input pin.
  pinMode( LED_PIN, OUTPUT );             // ESP8266 module blue LED.
  digitalWrite( ESP8266_GPIO4, 0 );       // Set relay control pin low.
Posted in iot | Tagged , | 154 Comments

ATMEL Studio 7 Does Blink in Assembly Language


See the previous post (here) for detailed information on AS7 installation and simulating of an Arduino program execution. As an exercise to gain familiarity with AS7, lets make an assembly language project using the below Blink code:

• Select “File>New>Project”.

• Assembler.

• AVR Assembler Project.

• Give it a unique name.

• Select the proper device.

Remember to select the Simulator in order to run the program inside the AS7 IDE. See the previous post on how to do this. Here is a screen shot of the assembly code running in AS7:


Here is the code for a very basic assembly language blink program:

.include ""

.org 0x0000
   rjmp start
    ldi r16, 0           ; reset system status
    out SREG, r16        ; init stack pointer
    ldi r16, low(RAMEND)
    out SPL, r16
    ldi r16, high(RAMEND)
    out SPH, r16

    sbi DDRB, DDB5       ;pinMode(13, OUTPUT);
    sbi PORTB, PORTB5    ;turn LED on
    rcall _delay
    cbi PORTB, PORTB5    ;turn LED off
    rcall _delay
    rjmp _loop

    ldi r24, 0x00        ;one second delay iteration
    ldi r23, 0xd4 
    ldi r22, 0x30 
_d1:                     ;delay ~1 second
    subi r24, 1   
    sbci r23, 0   
    sbci r22, 0
    brcc _d1

Now, start learning inline assembly language:

Also available as a book, with greatly expanded coverage!

[click on the image]

Posted in arduino, assembly language | Tagged , , , , , , , | Leave a comment

Using ATMEL Studio 7 for Arduino Development


While installing ATMEL Studio 7 (AS7) is not required in order to learn inline assembly language programming, it has worthwhile advantages. The ability to compile code for the Arduino, run it inside the included Simulator and immediately debug it will greatly speed your learning process. This seamless and iterative process would take several long minutes to complete if using the Arduino IDE. In fact, the Arduino IDE alone cannot perform the disassembly and debug functions.

AS7 now features seamless, one-click importation of projects created in the Arduino development environment. Your sketch, including any libraries it references, can be imported into AS7 as a C++ project. Once imported, you can leverage many additional capabilities of AS7 to fine-tune and debug your design. For most Arduino boards, shield-adapters that expose debug connectors are available, or one could use an ATMEL-ICE debug wire interface with a standard Arduino (slight modification of the Arduino is required).

More information on DebugWire with the Arduino can be found here:
Debugging Arduino using debugWire
Debugging with the new ATMEL-ICE
Modify an Arduino for DebugWire

Best of all, AS7 is free of charge. It is also important to note, that most of the functionality of AS7 is also available in the older Atmel Studio 6 IDE (AS6).

Install ATMEL Studio 7

I’m not going to go through the whole installation process (get the book), just navigate to the ATMEL Studio 7 website download page, and click the DOWNLOAD NOW link:

Direct link to the download page:

Select the web installer unless you have a specific requirement to install off-line.

ATMEL Studio 7 Blinks

Congratulations if you have completed the installation. Let’s run the Studio for the first time.

On initial start, AS7 will ask you to select a user interface profile. I suggest the “Advanced” version, however either option is satisfactory. If desired, the profile can always be changed later.


Be patient, it will seem like a long period of time for the program to load, but eventually you will be greeted with the IDE and the startup page populated with an ATMEL announcement internet feed. It is possible to disable this if it is unwanted (checkbox on the bottom left).


From the file menu select NEW->Project.


The New Project dialog will open and make the following selections:

• Insure “Installed” and “C/C++” is highlighted on the far left.
• Highlight “Create project from Arduino sketch”.
• Give the project an appropriate name like, “AS7_Blink” (note, spaces are not allowed in the project name).
• The default location should fill automatically.
• The solution name defaults to the project name, and this is acceptable for now.


The next dialog will ask for the existing Arduino sketch location. Click on the button containing the ellipsis to the right of the Sketch File. Navigate to the Arduino Blink example sketch, which should be located inside your Arduino installation folder (similar location to mine on Windows):

C:\Program Files (x86)\Arduino\examples\01.Basics\Blink\Blink.ino

Continuing along:

• Insure your Arduino IDE path is properly filled in.
• Make the appropriate selections for your board (Arduino/Genuino Uno) and device (atmega328p).
• Click “Ok”.

AS7 will take some time creating the folder(s) for the project and pulling in the Blink sketch with all of its dependencies (all core and variant include files, Arduino core source files and libraries). It will eventually finish by creating and opening an editable “sketch.cpp” file inside the IDE.


The sketch.cpp file is primarily the code from the example Arduino blink program including some additional automatically generated code by the ATMEL studio. Your solution should look like this:


Ready, Set, Simulate!

We can now run the Blink program without an actual Arduino board using the simulator included with AS7. But first we need to inform AS7 to use the Simulator. Under the Project menu, select “Blink Properties…”.


On the properties screen that appears, on the far left, select the “Tool” item, and under the heading of “Selected debugger/programmer”, select “Simulator” from the drop down list.


Now, in order to run the program in the simulator, simply press “Alt+F5” or select “Debug->Start Debugging and Break” using the menu. AS7 will build the Blink project, execute the C-runtime startup code and then halt at the first line of the Arduino program. If you examine your screen, you’ll notice the debugger (on the far left) is pointing to and highlighting the line of code it has paused on:



At this point, you might think to yourself, this code wasn’t in the Blink program. What gives?

All of this is actually code that comes from the core Arduino wiring files which silently gets inserted into every Arduino sketch as needed. If you look further down the screen you will eventually see the call to the Blink Setup() function.

Hitting the “F10” key (Step Over) twice should bring the simulator to the function call to Setup. At this time, pressing the “F11” key (Step Into) will cause the simulator to jump into the Blink program and pause it’s execution on the first line inside the Setup function:

pinMode(led, OUTPUT);

Now things should start to look familiar. The last feature I want to demonstrate is the ability to drill down into the underlying assembly language that was created by the compiler. We do this by selecting an option from the Debug menu, “Debug>Windows>Disassembly”.


You should now see a mixture of C and assembly code which makes up the executable Blink program. The simulator should be paused on the following line:

0000007B LDI R22,0x01 Load immediate


I can not overstate the importance of this feature. The ability to examine the assembly code, compare it to the source, and to step through it line-by-line is an enormous asset to the inline assembly programmer. You can, among other features:

• Set breakpoints
• Watch variables
• Alter register values
• Time sections of code.

You will want to remember how to do this.

ATMEL has uploaded several informative videos on demonstrating the use of the Studio, how to debug and how to use the simulator. A good example is located here:

Also available as a book, with greatly expanded coverage!

[click on the image]

Here’s a partial look at the disassembly listing (mixed C and assembly) produced by AS7:

void setup() {                
  // initialize the digital pin as an output.
  pinMode(led, OUTPUT);     
0000007B  LDI R22,0x01		Load immediate 
0000007C  LDS R24,0x0100		Load direct from data space 
0000007E  JMP 0x000001A5		Jump 
  pinMode(led, OUTPUT);     

// the loop routine runs over and over again forever:
void loop() {
00000080  PUSH R28		Push register on stack 
00000081  PUSH R29		Push register on stack 
  digitalWrite(led, HIGH);   // turn the LED on (HIGH is the voltage level)
00000082  LDI R28,0x00		Load immediate 
00000083  LDI R29,0x01		Load immediate 
00000084  LDI R22,0x01		Load immediate 
00000085  LDD R24,Y+0		Load indirect with displacement 
00000086  CALL 0x000001E1		Call subroutine 
  delay(5000);               // wait for a second
00000088  LDI R22,0x88		Load immediate 
00000089  LDI R23,0x13		Load immediate 
0000008A  LDI R24,0x00		Load immediate 
0000008B  LDI R25,0x00		Load immediate 
0000008C  CALL 0x00000119		Call subroutine 
  digitalWrite(led, LOW);    // turn the LED off by making the voltage LOW
0000008E  LDI R22,0x00		Load immediate 
0000008F  LDD R24,Y+0		Load indirect with displacement 
00000090  CALL 0x000001E1		Call subroutine 
  delay(1000);               // wait for a second
00000092  LDI R22,0xE8		Load immediate 
00000093  LDI R23,0x03		Load immediate 
00000094  LDI R24,0x00		Load immediate 
00000095  LDI R25,0x00		Load immediate 
00000096  POP R29		Pop register from stack 
00000097  POP R28		Pop register from stack 
  delay(1000);               // wait for a second
00000098  JMP 0x00000119		Jump 
Posted in arduino, avr | Tagged , , , , , , , | Leave a comment

My Cup Overflows


When performing math (even basic addition and subtraction) with signed numbers an overflow problem sometimes arises. The Arduino microcontroller indicates the existence of an overflow error by setting the overflow flag in the SREG. Here’s a demonstration of the overflow problem with a simple addition operation:

volatile int8_t n1=0x70; //112
volatile int8_t n2=0x35; //53
volatile int8_t answer;

void setup() {
    "add %1, %2 \n"
    : "=r" (answer) : "r" (n1), "r" (n2)
  Serial.print("answer = "); Serial.println(answer);

The result to the above addition is, “answer = -91”, or 0xA5 hexadecimal. That’s wrong! The reason the answer turns out wrong is because the result is larger than an 8-bit register can hold.

The largest “signed 8-bit number” is +127, or 0x7f hexadecimal. However, this operation did set the Status Register Overflow Flag (V flag) to warn us that the result is erroneous. But, it’s completely up to us, the programmer to deal with this issue.

What’s Your Sign?

In “8-bit signed number” operations, the overflow flag is set when either of the following two conditions occur:

• There is a carry from bit 6 to bit 7, but no carry out of bit 7 (C flag not set).
• There is a carry out of bit 7 (C flag set), but no carry from bit 6 to bit 7.

I bring these two cases to your attention, because we can perform addition on two negative numbers with the sign bit remaining correct, yet the addition could still overflow. For example, when adding -2 (0x80) and -128 (0xFE), the result becomes 0x7E (+126), which again is incorrect.

When adding two numbers with different signs, the absolute value of the result is a smaller number than the absolute value of the operands prior to the addition. In this case, an overflow is impossible.

Therefore, an overflow is only possible when adding two numbers with the same sign. Furthermore, when adding two “same-signed numbers”, the sign of the result must be the same. The conclusion here is, for signed number addition, if the overflow flag is set, the result is invalid, and in unsigned addition, if the carry flag is set, the result is invalid. In signed number operations, overflow is possible, and overflow corrupts the result and negates the sign bit.

See my tutorial on Arduino Inline Assembly Math here.

Posted in arduino, assembly language, avr, avr inline assenbly | Tagged , , , , , | Leave a comment

Arduino Inline Assembly Port & Pin Compendium


The following is a compendium of inline assembly functions dealing with ports and pins. Use these at your own risk. These functions have been trimmed of most bounds checking, so they can easily be abused. The Arduino Inline Assembly Tutorial explains most of the details starting here.


This inline code writes an analog value (in the form of a PWM wave) to a particular pin. After executing, the pin will generate a steady square wave of the specified duty cycle until the next call (or call to digitalRead() or digitalWrite() on the same pin). The frequency of the PWM signal on most pins is approximately 490 Hz. On the Uno and similar boards, pins 5 and 6 have a frequency of approximately 980 Hz. On Arduino boards with the ATmega168/328, this function works on pins 3, 5, 6, 9, 10, and 11. The analogWrite function has nothing to do with the analog pins or the analogRead function.

A pinMode() call is included inside this function, so there is no need to set the pin as an output before executing this code.

This version of AnalogWrite, with no frills saves ~542 bytes over the built-in function:

//analogWrite requires a PWM pin 
//PWM pin/timer table:
//set below 6 defines per above table
#define ANALOG_PORT         PORTB
#define ANALOG_PIN          PORTB3
#define ANALOG_DDR          DDRB
#define TIMER_REG           TCCR2A

volatile uint8_t val = 128; //0-255

  asm (
    "sbi  %0, %1   \n" //DDR set to output (pinMode)

    "cpi  %6, 0    \n" //if full low (0)
    "breq _SetLow  \n"
    "cpi  %6, 0xff \n" //if full high (0xff)
    "brne _SetPWM  \n"

    "sbi  %2, %1   \n" //set high
    "rjmp _SkipPWM \n"

  "_SetLow:        \n"
    "cbi  %2, %1   \n" //set low
    "rjmp _SkipPWM \n"

  "_SetPWM:        \n"
    "ld   r24, X   \n"
    "ori  r24, %3  \n"
    "st   X, r24   \n" //connect pwm pin timer# & channel
    "st   Z, %6    \n" //set pwm duty cycle (val)

  "_SkipPWM:       \n"
    : "r24"


The Arduino board contains a 6 channel, 10-bit analog to digital converter which is the brains beneath the analogRead function. It maps input voltages between 0 and 5 into integer values between 0 and 1023, thus yielding a resolution between readings of: 5/1024 units or, 0.0049 volts (4.9 mV) per unit. The input range and resolution can be changed through the ANALOG_V_REF define. This code reads the value from the specified analog channel (0-7), which correspond to the analog pins (note, do NOT use A0-A7 for the channel number in this code). Further information about the underlying ADC can be found here.

While this version of analogRead (aRead) saves a few bytes (~50), it also gives the option of changing the speed via the ADC prescaler. However, don’t arbitrarily change the prescale without understanding the consequences. ATMEL advises the slowest prescale should be used (PS128). A higher speed (smaller prescale) reduces the accuracy of the AD conversion. The arduino sets the prescale to 128 during initiation, just as the code below does.

//Define various ADC prescales
#define PS2   (1<<ADPS0)                             //8000kHz ADC clock freq
#define PS4   (1<<ADPS1)                             //4000kHz
#define PS8   ((1<<ADPS0) | (1<<ADPS1))              //2000kHz
#define PS16  (1<<ADPS2)                             //1000kHz
#define PS32  ((1<<ADPS2) | (1<<ADPS0))              //500kHz
#define PS64  ((1<<ADPS2) | (1<<ADPS1))              //250kHz
#define PS128 ((1<<ADPS2) | (1<<ADPS1) | (1<<ADPS0)) //125kHz
#define ADC_PRESCALE     PS128   //PS16, PS32, PS64 or P128(default)

uint16_t aRead(uint8_t channel) {
  uint16_t result;
  asm (
    "andi %1, 0x07    \n" //force pin==0 thru 7
    "ori  %1, (%6<<6) \n" //(pin | ADC Vref)
    "sts  %2, %1      \n" //set ADMUX

    "lds  r18, %3             \n" //get ADCSRA
    "andi r18, 0xf8           \n" //clear prescale bits
    "ori  r18, ((1<<%5) | %7) \n" //(new prescale | ADSC)
    "sts  %3, r18             \n" //set ADCSRA

    "_loop:       \n" //loop until ADSC cleared
    "lds  r18, %3 \n"
    "sbrc r18, %5 \n"
    "rjmp _loop   \n"

    "lds  %A0, %4   \n" //result = ADCL 
    "lds  %B0, %4+1 \n" //ADCH

    : "=r" (result) : "r" (channel), "M" (_SFR_MEM_ADDR(ADMUX)),
    : "r18"
  return result;


The arduino pinMode function configures pin behavior. The code presented from here on, has been previously explained inside the Arduino Inline Tutorial Series.

asm (
  "sbi %0, %1 \n" //1=OUTPUT
    : : "I" (_SFR_IO_ADDR(DDRB)), "I" (DDB5)


asm (
  "cbi %0, %2 \n"
  "sbi %1, %2 \n"
    : : "I" (_SFR_IO_ADDR(DDRB)), "I" (_SFR_IO_ADDR(PORTB)), "I" (DDB5)

pinMode (INPUT)

asm (
  "cbi %0, %2 \n"
  "cbi %1, %2 \n"
    : : "I" (_SFR_IO_ADDR(DDRB)), "I" (_SFR_IO_ADDR(PORTB)), "I" (DDB5)

pinMode with Multiple Pins

#define PIN_DIRECTION 0b00101000 //PIN 3 & 5 OUTPUT
//#define PIN_DIRECTION (1<<DDB3) | (1<<DDB5)
asm (
  "out %0, %1 \n"

digitalWrite HIGH

If a pin has been configured as an OUTPUT, its voltage will be set to the corresponding value: 5V (or 3.3V on 3.3V boards) for HIGH, 0V (ground) for LOW. However, if the pin is configured as an INPUT, digitalWrite enables (HIGH) or disables (LOW) the internal pullup on the input pin.

asm (
  "sbi %0, %1 \n"
  : : "I" (_SFR_IO_ADDR(PORTB)),"I" (PORTB5)

digitalWrite LOW

asm (
  "cbi %0, %1 \n"
  : : "I" (_SFR_IO_ADDR(PORTB)), "I" (PORTB5) 


volatile uint8_t output = HIGH; //LOW or HIGH
asm (
  "cpi %2, 0     \n"
  "breq 1f       \n"
  "sbi %0, %1    \n"
  "rjmp 2f       \n"
  "1: cbi %0, %1 \n"
  "2:            \n"
  : : "I" (_SFR_IO_ADDR(PORTB)), "I" (PORTB5), "r" (output)


Try to find this one in the Arduino wiring code:

//toggle pin
asm (
  "in r24, %0  \n"
  "eor r24, %1 \n"
  "out %0, r24 \n"
  : : "I" (_SFR_IO_ADDR(PORTB)), "r" ((uint8_t)_BV(PORTB5)) : "r24"


digitalRead simply reads the value from a specified digital pin, either HIGH or LOW.

volatile uint8_t status;
asm (
  "in __tmp_reg__, __SREG__  \n"
  "cli                       \n"                     
  "ldi %0, 1                 \n" //high 
  "sbis %1, %2               \n" //skip next if pin high
  "clr %0                    \n" //low
  "out __SREG__, __tmp_reg__ \n"
  : "=r" (status) : "I" (_SFR_IO_ADDR(PINB)), "I" (PINB5)  

digitalRead Alternative

This is a generic alternative, which can be called programmatically. Note it must be called using a pointer to the PIN (&PINB), otherwise the compiler emits incorrect code:

//call like so:
//uint8_t status = dRead(&PINB, PINB5);

__attribute__ ((noinline)) uint8_t dRead(volatile uint8_t *port, uint8_t pin) {
  uint8_t result, mask=1;

  asm (
    "movw  r30, %1 \n" //port reg addr in Z
  "1:              \n"
    "cpi  %2, 0    \n" //loop until pin==0
    "breq 2f       \n" //leave loop
    "lsl  %3       \n" //shift (mask) left 1 position
    "dec  %2       \n" //decrement loop counter
    "rjmp 1b       \n" //repeat
  "2:              \n"
    "in   __tmp_reg__, __SREG__ \n" //preserve sreg
    "cli           \n" //disable interrupts
    "ld   r18, Z   \n" //fetch port data
    "and  r18, %3  \n" //compare pin with mask
    "ldi  %0, 1    \n" //set return high
    "brne 3f       \n" 
    "clr  %0       \n" //set return low
  "3:              \n"
    "out  __SREG__, __tmp_reg__ \n"
    : "=&r" (result) : "r" (port), "a" (pin), "r" (mask) : "r18", "r30", "r31"

  return result;

Example of turning off PWM for arduino digital pin #11

//digital PWM pin registers:

asm (
  "ld  r16, Z \n"
  "ldi r17, 0xff \n"
  "eor r17, %1 \n"
  "and r16, r17 \n"
  "st  Z, r16 \n"
  : : "z" (_SFR_MEM_ADDR(TCCR2A)), "d" (COM2A1) : "r16", "r17"

Also available as a book, with greatly expanded coverage!

[click on the image]

Posted in arduino, assembly language, avr, avr inline assenbly | Tagged , , , , , , , | 1 Comment

Arduino Inline Assembly Tutorial (Examples)

case study

As the final tutorial in this series, we present four example inline assembly functions for the arduino. Specifically, these cover the conversion of a byte to a hexadecimal string, SPI Mode 0 hardware transfer, SPI Mode 0 Bit-banging, and the C library atoi function. Do not take these functions as archetypical examples of high-quality coding practice or brilliantly efficient inline code. They are just simple examples.

Most of the previous examples in this series were simple “snippets of code”, and as such gave a myopic view of inline assembly. The goal here is to show complete and working demonstrations of how to include inline assembly into the typical arduino program. Each example includes explanatory comments covering the key portions of code.

In addition to these examples, have a look at the Arduino Inline Assembly Blink Program.

Stringing Hexadecimals

The following code converts a byte value into a hexadecimal string. Notice at the start of the code, that the constraint #0 value (val) is temporarily saved in the r25 register. The function then converts the first nibble. When the conversion process is complete, the function loops back and converts the second nibble. Note how the code uses the SREG T-bit to flag the first vs. second nibble.

void ByteToHexStr(uint8_t val, char *str) {
  asm (
    "set           \n" //flag first nibble
    "mov r25, %0   \n" //save val
    "swap %0       \n" //swap for correct nibble order
  "1:              \n"
    "andi %0, 0xf  \n" //mask a nibble
    "cpi  %0, 0xa  \n" //>10?
    "brcc 2f       \n" //yes
    "subi %0, 0xd0 \n" //convert numeral (0-9) 
    "rjmp 3f       \n" //skip next
  "2:              \n"
    "subi %0, 0xc9 \n" //convert letter (A-F)
  "3:              \n"
    "st Z+, %0     \n" //put into string
    "brtc 4f       \n" //upper nibble?
    "clt           \n" //clear nibble flag
    "mov %0, r25   \n" //get upper nibble
    "rjmp 1b       \n" //repeat conversion
  "4:              \n" //exit
    : : "r" (val), "z" (str) : "memory"

I SPI With My Little Eye…

Serial Peripheral Interface (SPI) is a synchronous serial data protocol used by microcontrollers for communicating with one or more peripheral devices, or for communication between two microcontrollers. The SPI standard is loose and each device implements it a little differently, which means you must pay close attention to the device’s datasheet when implementing the protocol. Generally speaking, there are four modes of transmission, defined by the clock phase and polarity.

Here are two versions of the SPI transfer function. The first of these programs incorporates the arduino hardware SPI. The second is a bit-bang version using different pins. More information on SPI can be found here and here.

SPI Mode 0 Hardware Transfer

static __attribute__ ((noinline)) uint8_t SpiXfer(uint8_t data) {
  asm (
    "out  %1, %0          \n" //put data out SPDR register
    "nop                  \n" //pause
  "1:                     \n"
    “in   __tmp_reg__, %2 \n" //check xmit complete
    "sbrs __tmp_reg__, %3 \n"
    "rjmp 1b              \n"
    "in   %0, %1          \n" //get incoming data
    : "+r" (data) : "M" (_SFR_IO_ADDR(SPDR)),
    "M" (_SFR_IO_ADDR(SPSR)), "I" (SPIF)

  return data;

SPI Bit-Bang

#define MOSI_BIT   PORTD5
#define MISO_BIT   PIND6

static __attribute__ ((noinline)) uint8_t SpiBitBang(uint8_t data) {
  register uint8_t tmp, i=8;
  //save and restore sreg because t-bit is utilized
  asm (
    "in __tmp_reg__, __SREG__ \n"
  "1:               \n"
    "sbrs %0, 0x07  \n" //is output data bit high?
    "rjmp 2f        \n" //no
    "sbi  %3, %4    \n" //output a high bit
    "rjmp 3f        \n"
  "2:               \n"
    "cbi  %3, %4    \n" //output a low bit
  "3:               \n"
    "lsl  %0        \n" //shift to next bit
    "in   %1, %5    \n" //get input
    "tst  %1        \n" //anything here?
    "breq 4f        \n" //nope
    "bst  %1, %6    \n" //set t-bit if input bit is high
    "clr  %1        \n" //zeroize register
    "bld  %1, 0     \n" //set bit 0
    "or   %0, %1    \n" //or low bit with data for return value
  "4:               \n"
    "sbi  %7, %8    \n" //toggle clock bit high
    "nop            \n" //pause
    "cbi  %7, %8    \n" //toggle clock bit low
    "subi %2, 1     \n" //more bits?
    "brne 1b        \n" //do next bit
    "out __SREG__, __tmp_reg__ \n"
    : "+r" (data), "=&r" (tmp): "a" (i),

  return data;

A Toy

Atoi is a function in the that converts a string into an integer numerical representation (atoi stands for ASCII to integer). It is included in the C standard library header file stdlib.h. It is prototyped as follows:

int atoi(const char *str);

The str argument is a string, represented by an array of characters, containing the characters of a signed integer number. The string must be null-terminated.

Here is the basic idea of the atoi function implemented in C language:

int16_t atoi(char s[]) {
  uint8_t i, sign;
  int16_t n;
  //skip white space
  for (i=0; s[i]<=' '; i++);
  sign = 0;
  if (s[i] == '-') {
    sign = 1;
  for (n=0; s[i]>='0' && s[i]<='9'; i++)
    n = 10*n + s[i] - '0';
  if (sign)
    return (-1*n);
    return n;

Atoi Inline

Here is our implementation, which is only 64 bytes in length. By comparison, the arduino AVR libc atoi() function is 76 bytes long. This version is basically functionally equivalent, however there are a few detail differences (this function steps over all leading ASCII characters 0x2F and below, not just whitespace):

int16_t _atoi(const char *s) {
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wuninitialized"
  //sign & c are initialized inside inline asm code
  register uint8_t sign, c;
#pragma GCC diagnostic pop
  //force result into return registers
  register int16_t result asm("r24"); 
  asm (
    "ldi  %A0, 0x00         \n" //result = 0
    "ldi  %B0, 0x00         \n"

  "1:                       \n"
    "ld   %2, Z+            \n" //fetch char
    "cpi  %2, '-'           \n" //negative sign?
    "brne 2f                \n"
    "ldi  %3, 0x01          \n" //sign = TRUE

  "2:                       \n"
    "cpi  %2, '/' + 1       \n" //step over whitespace/garbage
    "brcc 3f                \n"
    "rjmp 1b                \n"

  "3:                       \n"
    "rjmp 5f                \n"

  "4:                       \n"
    "ldi  r23, 10           \n" //result *= 10
    "mul  %B0, r23          \n"
    "mov  %B0, r0           \n"
    "mul  %A0, r23          \n"
    "mov  %A0, r0           \n"
    "add  %B0, r1           \n"
    "clr  __zero_reg__      \n" //r1 trashed by mul
    "add  %A0, %2           \n" //result += new digit
    "adc  %B0, __zero_reg__ \n"
    "ld   %2, Z+            \n" //fetch next digit char
  "5:                       \n"
    "subi %2, '0'           \n" //convert char to 0-9
    "cpi  %2, 10            \n" //end of string?
    "brlo 4b                \n"

    "cpi  %3, 0             \n" //negative?
    "breq 6f                \n"
    "com  %B0               \n" //negate result
    "neg  %A0               \n"
    "sbci %B0, -1           \n"
  "6:                       \n"
    : "+r" (result) : "z" (s), "a" (c), "a" (sign) : "memory"

  return result;


While there are countless more topics to cover, and many more rabbit-holes to dive down, I believe I have covered enough of the basics in this series. I sure enjoyed researching and writing these tutorials. And, hopefully you gained a few insights into the funky world of arduino (AVR) inline assembly programming. Now, get inline with your programming!

[updated: 4.11.16]

Also available as a book, with greatly expanded coverage!

[click on the image]

Posted in arduino, assembly language, avr, avr inline assenbly | Tagged , , , , , | Leave a comment

Arduino Inline Assembly Tutorial (Interrupts)


Pardon The Interruption

The previous tutorial covered the basics of writing inline functions. A close relative of the function is the Interrupt Service Routine (ISR), which is the topic here. Portions of this tutorial may pertain to functions as well.

As a warning, this tutorial assumes an understanding of the basic concepts of interrupts in general, and specifically interrupt handlers on the arduino (AVR μC). Hopefully, you have already written a few arduino interrupts in C, using the internal arduino functionality. If not, you may want to study some of the links given in the reference section of this tutorial before continuing.

The Deck is Stacked

Basic knowledge of the stack is essential to understanding functions and interrupt handlers. The basic purpose of the stack is to support function calls and interrupts. Whenever a program makes a function call or whenever an interrupt occurs, the stack is used to store critical information which will be restored upon completion of the function or interrupt. Additional information on the stack can be found here and here.

First and primary, during a function call or interrupt, the hardware places the return address on the stack. The saving and restoration of the return address is accomplished transparently by the CALL and RET instructions. It is not necessary to perform any special instruction(s) to make this occur.

Second, if any “call-saved” registers will be “clobbered” inside the function, these registers are “pushed” onto the stack. In the case of an interrupt service routine, all of the registers used inside the ISR (and always the temporary and zero registers, r0 and r1) get pushed onto the stack. Additionally, during an ISR the SREG is saved and restored.

Finally, if the compiler deems it necessary, space is reserved for any local variables on the stack. Many times the compiler will place local variables into specific registers, and therefore doesn’t use the stack for temporary storage.

Here is an example of how the compiler uses the stack to store local variables inside of a function. This is sometimes referred to as “setting up a stack frame.” We will reserve 16 bytes for a character array (note: unrelated code has been removed for the purpose of clarity). The compiler performs all of this stack manipulation for us behind the scenes, so-to-speak:

void example(void) {
  char buffer[16]; //space will be reserved on the stack
  //do something here. . .

Result in this machine code:

  PUSH r28          ;save registers on stack 
  PUSH r29 
  IN   r28, SPL     ;get stack pointer    
  IN   r29, SPH   
  SBIW r28, 16      ;reserve 16 bytes space on stack
                    ;the stack grows downward, hence the subtraction
  OUT  SPH, r29     ;update new stack pointer
  OUT  SPL, r28 
;do something here. . .
  ADIW r28, 16      ;remove the 16 bytes from the stack
  OUT  SPH, r29     ;restore stack pointer
  OUT  SPL, r28 
  POP  r29          ;restore registers from stack
  POP  r28 

Upon return from the interrupt or function, all the preserved values are restored, or “popped” from the stack. Obviously, during the pro and epilogue code, the order of the push and pop instructions is very critical.

Interrupt Before and After

Below, I wrote a very basic interrupt routine that simply increments a byte so we can examine the prologue and epilogue code generated by the compiler:

//here is an example ISR coded in C:
volatile uint8_t a;
ISR(INT0_vect) {
//this is the generated assembly code:
0000027F 1f.92                PUSH r1       ;save r1 register
00000280 0f.92                PUSH r0       ;save r0 register
00000281 0f.b6                IN r0, SREG   ;get status register
00000282 0f.92                PUSH r0       ;save sreg 
00000283 11.24                CLR r1        
00000284 8f.93                PUSH r24      ;save r24 register
;increment byte (a) here
00000285 80.91.c3.01          LDS r24, (a) 
00000287 8f.5f                SUBI r24, 0xFF     
00000288 80.93.c3.01          STS (a), r24 
0000028A 8f.91                POP r24       ;restore r24 register
0000028B 0f.90                POP r0        ;restore status register
0000028C                OUT SREG, r0
0000028D 0f.90                POP r0        ;restore r0 register
0000028E 1f.90                POP r1        ;restore r1 register
0000028F 18.95                RETI          ;return from interrupt

As you can see, the meat of the ISR is only 10 bytes long. However, together the prologue and epilogue add another 24 bytes, for a total of 34. It might be possible to save a few bytes and program cycles by tightly writing your own ISR pro and epilogue. GCC has a provision which allows writing your own pro and epilogues, which will be covered later.

We Interrupt This Program to Blink

It is now time to write an interrupt handler, or ISR in inline assembler. I can’t think of a better example than to adapt the basic Blink sketch to use the Timer #1 Overflow interrupt. Please note, because this code alters the Timer #1 registers, it will render any use of the arduino Timer #1 as nonfunctional (i.e. analogWrite pins 9 & 10, the Servo Library, etc.).

Handle It

The first order of business is to write the interrupt handler for the Timer #1 Overflow. This is the routine that is called when the Timer #1 counter (TCNT1) rolls over from 0xffff to zero. Our the ISR is very basic, and as always, it should be kept as short as possible. Inside the handler we perform two functions:

  • Reset the counter (TCNT1) allowing the next overflow to reoccur at 1 second intervals.
  • Toggle the LED.

An ISR can be coded using inline assembler just as in a “C Stub Function”, relying upon the compiler to insert the necessary prologue and epilogue code. I suggest you use this stub technique at first before graduating to writing the entire “naked” ISR. Here is a stub version of our ISR:

#define TCNT_BASE   0x0bdc
#define TCNT_BASE_H (((TCNT_BASE)>>8)&0xff)
#define TCNT_BASE_L ((TCNT_BASE)&0xff)

ISR(TIMER1_OVF_vect) {
  asm (
    //reload TCNT1 counter for 1sec interrupt
    "ldi r24, %3           \n"
    "st  Z+, r24           \n" //TCNT1L
    "ldi r24, %4           \n"
    "st  Z, r24            \n" //TCNT1H
    //toggle LED
    "in   __tmp_reg__, %0  \n" //read port
    "ldi  r24, %1          \n" //LED bit mask
    "eor  __tmp_reg__, r24 \n" //toggle LED bit
    "out  %0, __tmp_reg__  \n" //write port
    : : "I" (_SFR_IO_ADDR(PORTB)), "I" (_BV(PORTB5)),
    "z" (_SFR_MEM_ADDR(TCNT1)), "M" (TCNT_BASE_L), "M" (TCNT_BASE_H) : "r24"

Having said all that, the boilerplate code the compiler inserts is not always the most efficient, and many times inadequate. For these reasons, and for the academic exercise, we will also select the “ISR_NAKED” attribute when defining the ISR. This gives us full control over all of the code inside the ISR. Full control is a good thing:


Eleven instructions encompass the prologue and epilogue, which is more than the code required for the main purpose of the interrupt. Notice inside the handler, we utilize 3 registers, r24, r30 and r31. This means we need to preserve the content of these registers since the interrupt could be triggered at any time, even precisely when these registers may be in use. Additionally we need to preserve the status register (SREG). The SREG holds critical information on the state of the program when the interrupt fired. Neglecting to reserve any of this information would probably cause the program to crash.

Don’t forget to include the terminating RETI instruction also. By comparison, this ISR_NAKED version is 10 bytes shorter than the “Stub” version:

#include "k328p.h"

#define TCNT_BASE   0x0bdc
#define TCNT_BASE_H (((TCNT_BASE)>>8)&0xff)
#define TCNT_BASE_L ((TCNT_BASE)&0xff)

  asm (
    "push r31           \n" //save r30, r31 contents
    "push r30           \n"
    "push r24           \n"
    //preserve SREG
    "in   r24, __SREG__ \n"
    "push r24           \n"

    //reload TCNT1 counter for 1sec interrupt
    "clr r31            \n"
    "ldi r30, %2        \n"
    "ldi r24, %3        \n"
    "st  Z+, r24        \n" //TCNT1L
    "ldi r24, %4        \n"
    "st  Z, r24         \n" //TCNT1H
    //toggle LED
    "in   r30, %0       \n" //read port
    "ldi  r31, %1       \n" //LED bit mask
    "eor  r30, r31      \n" //toggle LED bit
    "out  %0, r30       \n" //write port

    //restore old SREG
    "pop  r24           \n"
    "out  __SREG__, r24 \n"
    //restore r30, r31
    "pop r24            \n"
    "pop  r30           \n"
    "pop  r31           \n"
    "reti               \n"
    : : "I" (kPORTB), "I" (_BV(PORTB5)), 
    "M" (kTCNT1), "M" (TCNT_BASE_L), "M" (TCNT_BASE_H)

The initiation code required for the Timer #1 interrupt (setting the prescaler, loading the counter and enabling the overflow interrupt) is completely contained inside the Setup function. Obviously, it is not necessary to write this in inline assembly, it’s just good practice:

#include "k328p.h"

#define TCNT_BASE   0x0bdc
#define TCNT_BASE_H (((TCNT_BASE)>>8)&0xff)
#define TCNT_BASE_L ((TCNT_BASE)&0xff)

void setup() {
  uint16_t TNCTBase = TCNT_BASE;

  asm (
    "cli                  \n" //disable gloal interrupts 
    "sbi %0, %1           \n" //pinMode(13, OUTPUT);

    //set 256 prescale (CS12)
    "st  Z+, __zero_reg__ \n" //TCCR1A
    "ldi r24, %3          \n"
    "st  Z+, r24          \n" //zero TCCR1B
    "st  Z, __zero_reg__  \n" //zero TCCR1C
    //load counter for 1sec interrupt
    "ldi r30, %4          \n"
    "st  Z+, %A5          \n" //TCNT1L
    "st  Z, %B5           \n" //TCNT1H
    //enable overflow interrupt
    "ldi r30, %6          \n"
    "ldi r24, %7          \n"
    "st  Z, r24           \n" //TIMSK1

    "sei                  \n" //enable global interrupts 
    : : "I" (_SFR_IO_ADDR(DDRB)), "I" (PORTB5),
    "z" (_SFR_MEM_ADDR(TCCR1A)), "I" (_BV(CS12)),
    "M" (kTCNT1), "r" (TNCTBase),
    "M" (kTIMSK1), "I" (_BV(TOIE1)) : "r24", "memory"

void loop() { }

Finally, we are introducing a new header file “k328p.h” (contents listed below) which contains all of the IO register defines in such a way that we can use them inside our inline assembly routines. The definitions in this file use the same standard ATMEL mnemonics for the IO registers with the letter ‘k’ pre-pended. They are the LSB of the IO register address, and allow greater flexibility in inline assembler code when referring to the IO registers (when using pointer registers with the LD/ST instructions). A close examination of the above code will reveal the method of use.

Arduino IO Register Defines

//k328p.h - definitions for ATmega328P
#ifndef _k328P_H_
#define _k328P_H_ 

//standard registers 
//0-0x1f: bit addressable
//0-0x3f: IN/OUT compatible 
//0-0x3f: add 0x20 when using LD/ST
#define kPINB   0x03
#define kDDRB   0x04
#define kPORTB  0x05
#define kPINC   0x06
#define kDDRC   0x07
#define kPORTC  0x08
#define kPIND   0x09
#define kDDRD   0x0A
#define kPORTD  0x0B

#define kTIFR0  0x15
#define kTIFR1  0x16
#define kTIFR2  0x17

#define kPCIFR  0x1B
#define kEIFR   0x1C
#define kEIMSK  0x1D
#define kGPIOR0 0x1E
#define kEECR   0x1F
//end bit addressable

#define kEEDR   0x20
#define kEEAR   0x21
#define kEEARL  0x21
#define kEEARH  0x22
#define kGTCCR  0x23
#define kTCCR0A 0x24
#define kTCCR0B 0x25
#define kTCNT0  0x26
#define kOCR0A  0x27
#define kOCR0B  0x28

#define kGPIOR1 0x2A
#define kGPIOR2 0x2B
#define kSPCR   0x2C
#define kSPSR   0x2D
#define kSPDR   0x2E

#define kACSR   0x30

#define kMCUSR  0x34
#define kMCUCR  0x35

#define kSPMCSR 0x37

#define kSPL    0x3D
#define kSPH    0x3E
#define kSREG   0x3F
//end IN/OUT compatible

//extended registers begin
#define kWDTCSR 0x60
#define kCLKPR  0x61

#define kPRR    0x64

#define kOSCCAL 0x66

#define kPCICR  0x68
#define kEICRA  0x69

#define kPCMSK0 0x6B
#define kPCMSK1 0x6C
#define kPCMSK2 0x6D
#define kTIMSK0 0x6E
#define kTIMSK1 0x6F
#define kTIMSK2 0x70

#define kADC    0x78
#define kADCW   0x78
#define kADCL   0x78
#define kADCH   0x79
#define kADCSRA 0x7A
#define kADCSRB 0x7B
#define kADMUX  0x7C

#define kDIDR0  0x7E
#define kDIDR1  0x7F

#define kTCCR1A 0x80
#define kTCCR1B 0x81
#define kTCCR1C 0x82

#define kTCNT1  0x84
#define kTCNT1L 0x84
#define kTCNT1H 0x85
#define kICR1   0x86
#define kICR1L  0x86
#define kICR1H  0x87
#define kOCR1A  0x88
#define kOCR1AL 0x88
#define kOCR1AH 0x89
#define kOCR1B  0x8A
#define kOCR1BL 0x8A
#define kOCR1BH 0x8B

#define kTCCR2A 0xB0
#define kTCCR2B 0xB1
#define kTCNT2  0xB2
#define kOCR2A  0xB3
#define kOCR2B  0xB4
#define kASSR   0xB6

#define kTWBR   0xB8
#define kTWSR   0xB9
#define kTWAR   0xBA
#define kTWDR   0xBB
#define kTWCR   0xBC
#define kTWAMR  0xBD

#define kUCSR0A 0xC0
#define kUCSR0B 0xC1
#define kUCSR0C 0xC2

#define kUBRR0  0xC4
#define kUBRR0L 0xC4
#define kUBRR0H 0xC5
#define kUDR0   0xC6
//end extended registers

//0-0x3f for LD/ST instructions
#define k2PINB   0x23
#define k2DDRB   0x24
#define k2PORTB  0x25
#define k2PINC   0x26
#define k2DDRC   0x27
#define k2PORTC  0x28
#define k2PIND   0x29
#define k2DDRD   0x2A
#define k2PORTD  0x2B
#define k2TIFR0  0x35
#define k2TIFR1  0x36
#define k2TIFR2  0x37
#define k2PCIFR  0x3B
#define k2EIFR   0x3C
#define k2EIMSK  0x3D
#define k2GPIOR0 0x3E
#define k2EECR   0x3F
#define k2EEDR   0x40
#define k2EEAR   0x41
#define k2EEARL  0x41
#define k2EEARH  0x42
#define k2GTCCR  0x43
#define k2TCCR0A 0x44
#define k2TCCR0B 0x45
#define k2TCNT0  0x46
#define k2OCR0A  0x47
#define k2OCR0B  0x48
#define k2GPIOR1 0x4A
#define k2GPIOR2 0x4B
#define k2SPCR   0x4C
#define k2SPSR   0x4D
#define k2SPDR   0x4E
#define k2ACSR   0x50
#define k2MCUSR  0x54
#define k2MCUCR  0x55
#define k2SPMCSR 0x57
#define k2SPL     0x5D
#define k2SPH     0x5E
#define k2SREG    0x5F

#endif //_k328P_H_


Arduino Interrupts
Newbie’s Guide to AVR Interrupts
PJRC Guide to Interrupts
AVR Libc Information on Interrupts
University of Maryland, BC, C Programming and Embedded Systems Course, Interrupt Information
AVR 8-bit Instruction Set
AVR-GCC Inline Assembler Cookbook
Extended Asm – Assembler Instructions with C Expression Operands
Mixing C and Assembly Language
ATMEL ATmega328P Datasheet

Also available as a book, with greatly expanded coverage!

[click on the image]

Posted in arduino, assembly language, avr, avr inline assenbly | Tagged , , , , , | Leave a comment

Arduino Inline Assembly Tutorial (Functions)

func machine

At first consideration, the topic of functions seems simple and trite. Just discuss how to “CALL” and “RETURN” to and from a function, right? However, there are many subtopics involved as well. For example, passing and returning parameters, prologue and epilogue code, the stack frame and mixing assembly and C are topics deserving of separate tutorials. Hopefully, we can do all of these justice, but first, the basics…

Convert Snippet Into a Function

How about a simple demonstration of turning an inline code snippet into a function? In a previous tutorial on indirect addressing, several inline pieces of code were developed to perform various string operations. One such operation determined the character length of a string. The code is below.

String Length, Sounds Like strlen

const char src[4] = "abc";
volatile uint8_t len;
asm (
  "_loop:               \n"
  "ld   __tmp_reg__, Z+ \n"
  "tst  __tmp_reg__     \n"
  "brne _loop           \n"
  //Z points one character past the terminating NUL
  "subi %A1, 1          \n" //subtract post-increment
  "sbci %B1, 0          \n"
  "sub  %A1, %A2        \n" //length = end - start
  "sbc  %B1, %B2        \n"
  "mov  %0, %A1         \n" //save len (uint8_t)
  : "=r" (len) : "z" (src), "x" (src)

While this code could easily be included “inline”, it certainly would be more useful if it was defined as a general function. This would make it much easier to use throughout a program, and also reduce overall program size by incorporating only one instance of the code. So how is this accomplished?

Stub Your Code

The official Cookbook refers to this techniques as a “C Stub Function,” which is nothing more than a function definition containing only inline assembler code. Typically, in a “C Stub Function”, the function parameters and local variables define the data used in, and the value returned (if any) by the function. This is an easy method to pass data to/from the inline function, without the need to understand the underlying details of how its done. Therefore, we eliminate the necessity of writing additional supporting code.

The above “string length” snippet easily becomes a full blown function, _strlen() using this method. Notice the transformed function below receives a string, (s) as a parameter, and returns the length, which is defined as a local variable. We refer to these same variables in the input and output constraints:

inline uint8_t _strlen(const char *s) {
  uint8_t len;

  asm (
    "_loop:              \n"
    "ld  __tmp_reg__, Z+ \n"
    "tst __tmp_reg__     \n"
    "brne _loop          \n"
    //len=Z - 1 – src = (-1 - src) + Z = ~src + Z
    "com %A2             \n"
    "com %B2             \n"
    "add %A2, %A1        \n"
    "adc %B2, %B1        \n"
    : "=r" (len) : "z" (s), "x" (s)

  return len;

Here is a look at the code generated by the above C-Stub Function (notice the compiler/assembler doesn’t need to generate a lot of “stub” code):

  MOVW r30, r24
  MOVW r26, r24
  LD r0,Z+
  TST r0
  BRNE loop
  COM r26
  COM r27
  ADD r26, r30
  ADC r27, r31

Placing a Call

An extension to the “C Stub Function” technique is calling another C function from inside inline assembly code. The following bit of code demonstrates the CALL instruction. This instruction “calls” a subroutine located within the program memory (if we remember to properly define the function to avoid linkage errors). The C Stub Function even handles the return (RET) for us.

An additional detail required here, is the need to encapsulate the “called” function inside the extern “C” { } declaration (see below example). The extern “C”, C++ keyword prevents the function name from becoming “mangled”, thus preventing the linker from locating the called function.

extern "C" {
  void foo() {
    // do something here...

void test() {
  asm (
    "call foo \n"

Playing Catch

Next, we present a basic example of passing and returning parameters to and from C Stub Functions. The purpose of the following code is to convert an upper case ASCII character into its lower case equivalent. We’ve created two functions here, _isupper and _tolower, which validate the input character and then perform the conversion.

Take a look at the code below.

Notice, the first thing _tolower does is call the function, _isupper. Since _tolower hasn’t done anything yet, the C Stub Function simply hands the input character (c), the parameter to _tolower directly onto the _isupper function. Neat!

Next, _isupper checks the character to confirm its actually an upper case character. If so, it returns the character, otherwise it returns a zero. Upon returning to _tolower, the next instruction which is executed is “tst r24”, a test of the contents of register r24. If register #24 (r24) is not zero, the character is converted and the function returns.

Again, notice the use of the C++ keyword “extern C {}” here:

extern "C" {
  unsigned char _isupper(unsigned char c) {
    //bind variable to a specific register r18
    register unsigned char ch asm("r18");
    asm (
      "mov  %1, %0 \n" //save input
      "subi %1, 'A'\n" //subtract 0x41
      "brmi 2f     \n" //branch if minus
      "subi %1, 26 \n" //26 letters
      "brpl 2f     \n" //branch if plus
      "ret         \n" //c==upper, return
      "2: clr  %0  \n" //false
      : "+r" (c) : "r" (ch) 
    return c;

char _tolower(unsigned char c) {
  asm (
    "call _isupper \n" //validate char
    "tst r24       \n" //0 = not alpha char
    "breq 1f       \n" //not alpha char
    "ori %0, 0x20  \n" //make lower
    "1:            \n"
    : "+r" (c)
  return c;

Insider Information

Why did function _tolower choose to test register #24 (r24)? The above two functions relied on “insider” information when using register r24. These routines knew that an 8-bit, byte-sized value is passed to and from a function via the r24 register. The C Compiler always passes function arguments and returns values in specific register locations. Knowing these locations are essential to writing efficient inline assembly code, especially when interfacing with the C language.

This is a good time to review the data type sizes: a char is 8 bits, an int is 16 bits, a long is 32 bits, a long long is 64 bits, floats are 32 bits, and pointers are 16 bits (function pointers are word addresses). Arguments are allocated left to right, starting in register r25 descending through register r8. All arguments are aligned to start in even-numbered registers (odd-sized arguments, like char, have one free register above them), for example, a single 8-bit value is passed via the r24 register (r25 is assumed empty), a single 16-bit value is passed via the r25:r24 register pair, and a 32-bit value would be passed via r25:r24:r23:r22 register combination.

Return values are expected to be passed in a similar fashion. An 8-bit value is passed via r24, a 16-bit value in r25:r24, and 32-bits in r22:r23:r24:r25. An 8-bit return value may be zero/sign-extended to 16-bits by the called function.

What’s the Use of a Register?

Function “call-used” registers are r18-r27, and r30-r31. Any, or all of these registers may be allocated by the compiler for local data. However, we may use them freely in assembler subroutines. Calling C subroutines can clobber any of them, and the caller is responsible for saving and restoring before and after use.

Function “call-saved” registers are r2-r17, and r28-r29. They may also be allocated by the compiler for local data, but C subroutines leaves them unchanged. Assembler subroutines are responsible for saving and restoring any of these registers, if changed. The Y register pair (r29:r28) is used as a frame pointer (pointing to local data placed on the stack) if necessary.

Fixed registers, r0, and r1 are never allocated by the compiler for local data. The temporary register, r0 can be clobbered by any C code (except interrupt handlers which save it), and may be used freely. The zero register is r1, and assumed to be always zero in any C code. It may be used for other purposes within a piece of assembler code, but must then be cleared after use (clr r1). Interrupt handlers save and clear r1 on entry, and restore r1 on exit (in case it was non-zero).


AVR 8-bit Instruction Set
AVR-GCC Inline Assembler Cookbook
Extended Asm – Assembler Instructions with C Expression Operands
Mixing C and Assembly Language

Also available as a book, with greatly expanded coverage!

[click on the image]

Posted in arduino, assembly language, avr, avr inline assenbly | Tagged , , , , , | 1 Comment

Arduino Inline Assembly Tutorial (Tables)


Often, the fastest way to compute something on an arduino is to not compute it all.


For example, trigonometric functions are costly operations and can abruptly slow your application to the pace of a crawl. And many times, the result is computed with far more precision than needed for the situation. Most often you just want the periodic wave-like characteristics of sine or cosine, which can easily be approximated. With a trigonometric function, its easy to substitute a lookup-table populated with pre-computed values at discrete steps. If your program can handle the loss of precision, yet requires as much speed as possible, this alternative is a good option.

The Ivy League Microcontroller

Since the arduino’s ATMEL AVR μC is based upon the modified Harvard architecture, the data and program instructions are stored in different memory. The program instructions are stored in flash, while data is stored in SRAM. These separate pathways are primarily implemented to enhance performance, but it also prohibits executing program instructions from data memory. Yet it may seem paradoxical, data is allowed to be stored inside program memory (see this information on the use of the PROGMEM attribute).
Placing a table in SRAM is simple, and shouldn’t present problems for an inline programmer (especially at our stage!). Consequentially, in this tutorial, we will store a table inside program memory.

Did He Say Frogmen?

Placing the table into program memory is easy. It is accomplished via a C language floating-point array, incorporating the special keyword, “PROGMEM”. PROGMEM instructs the compiler to place this data into flash memory:

static const float PROGMEM SineTable[91] = {
  0.0, 0.017452, 0.034899, 0.052336, 0.069756,
. . .
  0.997564, 0.998630, 0.999391, 0.999848, 1.0

Previously, when accessing SRAM (data memory) we used the LDS instruction. However, accessing program memory requires the use of the LPM instruction. LPM is the mnemonic for Load from Program Memory, and it loads a data byte from flash program memory into a register.


The Flash program memory is organized as 16 bits words, while the registers and SRAM are organized as eight bits bytes. The Z-register is used to access the program memory. This 16 bits register pair is used as a 16 bit pointer to the Program memory. The 15 most significant bits selects the word address in Program memory. Because of this, the word address is multiplied by two before it is put in the Z-register. However, the good news is that in the code presented below all of these details are transparent.

Table Legs

The function below first limits the input value to a range between 0-90. If the input is out-of-range, it returns the floating-point Not-A-Number (NAN) value. It then multiplies the input by 4 to produce an index into our table. We multiply by four because our table is populated with floating point numbers, each of which is 4-bytes long. The index is simply added to the (PROGMEM) address of the start of the table. The functions finishes by retrieving the 4-byte float value and returning.

Note, floating point support inside the inline assembler is scarce. In this function we treat the float variable transparently, like any 32-bit variable. We get away with this because we’re not performing any operation on the value.

float _Sine(uint16_t angle) {
  float tmp;

  asm (
    //validate angle >= 0 && angle <= 90
    "cpi  %A1, 90+1 \n" 
    "cpc  %B1, __zero_reg__ \n"
    "brcc _NaN      \n" //out of range

     //calculate table index
    "lsl  %A1       \n" //float is 4 bytes wide
    "rol  %B1       \n" //index = angle * 4
    "lsl  %A1       \n"
    "rol  %B1       \n"

    //add index to start of SineTable
    "add  r30, %A1  \n" 
    "adc  r31, %B1  \n"

    //get sine value (4-bytes)
    "lpm  %A0, Z+   \n" 
    "lpm  %B0, Z+   \n"
    "lpm  %C0, Z+   \n"
    "lpm  %D0, Z    \n"
    "ret            \n" //exit
    //return NAN
    "_NaN:              \n" 
    "ldi  %A0, lo8(%3)  \n" //NAN = 0x7fc00000
    "ldi  %B0, hi8(%3)  \n"
    "ldi  %C0, hlo8(%3) \n"
    "ldi  %D0, hhi8(%3) \n"
    : "=a" (tmp) : "r" (angle), "z" (SineTable), "F" (NAN)
  return tmp;

The Full Table

#include <avr/pgmspace.h>

//max errror ~0.017452 [91*4=364 bytes]
static const float PROGMEM SineTable[91] = {
  0.0, 0.017452, 0.034899, 0.052336, 0.069756, 0.087156, 
  0.104528, 0.121869, 0.139173, 0.156434, 0.173648, 0.190809, 
  0.207912, 0.224951, 0.241922, 0.258819, 0.275637, 0.292372, 
  0.309017, 0.325568, 0.34202, 0.358368, 0.374607, 0.390731, 
  0.406737, 0.422618, 0.438371, 0.45399, 0.469472, 0.48481, 
  0.5, 0.515038, 0.529919, 0.544639, 0.559193, 0.573576, 
  0.587785, 0.601815, 0.615661, 0.62932, 0.642788, 0.656059, 
  0.669131, 0.681998, 0.694658, 0.707107, 0.71934, 0.731354, 
  0.743145, 0.75471, 0.766044, 0.777146, 0.788011, 0.798636, 
  0.809017, 0.819152, 0.829038, 0.838671, 0.848048, 0.857167, 
  0.866025, 0.87462, 0.882948, 0.891007, 0.898794, 0.906308, 
  0.913545, 0.920505, 0.927184, 0.93358, 0.939693, 0.945519, 
  0.951057, 0.956305, 0.961262, 0.965926, 0.970296, 0.97437, 
  0.978148, 0.981627, 0.984808, 0.987688, 0.990268, 0.992546, 
  0.994522, 0.996195, 0.997564, 0.99863, 0.999391, 0.999848, 1.0

What Are You Doing For Me?

After a cursory comparison test between the table _Sine() function and the arduino floating point sin() function, we can draw some basic conclusions. Even though the table itself consumes 364 (91 x 4 = 364) bytes of flash (on top of the function code), the arduino library sin() function (and it’s required peripheral floating point support) uses approximately 900 bytes more flash memory.

However, saving space wasn’t necessarily the goal of this exercise, speed was the primary concern. Comparing 1,000 calls to both functions yielded an average duration of 121.7uS per sin() vs. 2.92uS for _Sine(). One final but obvious concern is the precision of the result. This will need to be evaluated to determine if it is sufficient for your application.

Bigger, Better, More

Various modifications can expand and improve the accuracy of the table code, but are beyond the scope of this tutorial. However, here are some basic ideas.

The obvious methods is to expand the table to decrease the step interval. Another technique is to incorporate interpolation similar to the following pseudo code:

float _iSine(uint16_t angle) {
     uint16_t x1 = floor( angle );
     float y1 = SineTable[x1];
     float y2 = SineTable[x1 + 1];
     return y1 + ( y2 - y1 ) * ( x - x1 )

For full 0-360 angle coverage, do something like:

float Sine(uint16_t i) {
  while (i > 359)
    i -= 360;

  if (i < 90)
    return iSine(i);
  else if (i < 180)
    return iSine(179 - i);
  else if (i < 270)
    return (-1*iSine(i - 180));
  else if (i < 360)
    return (-1*iSine(359 - i));

Another easy expansion with the sine table is to calculate cosine and tangent values:

float _Cosine(uint16_t a) {
  return _Sine( a + 90 );
float _Tangent(uint16_t a) {
  return ( _Sine(a) / _Cosine(a) );


AVR 8-bit Instruction Set
AVR-GCC Inline Assembler Cookbook
Extended Asm – Assembler Instructions with C Expression Operands
Further information on addressing modes can be found in Section 2 of the AVR Instruction Set Manual
AVR108: Setup and Use of the LPM Instruction
Sine Lookup Table Generator

Also available as a book, with greatly expanded coverage!

[click on the image]

Code (error) updated: 1/25/2017

Posted in Uncategorized | Tagged , , , , , | 6 Comments

Arduino Inline Assembly Tutorial (Strings)


Addressing Modes

When loading and storing data, there are several addressing methods available for use. The arduino’s AVR microcontroller supports 13 address modes for accessing the Program memory (Flash) and Data memory (SRAM, Register file, I/O Memory, and Extended I/O Memory). Six modes use “direct addressing”, and as such are very basic. The direct modes are generally inherent in the assembly instruction. The good news is that, we covered all six in past tutorials, so there is no need to address them here (pun intended). Four additional modes incorporate indirect addressing, and will be the focus of this tutorial.

*Register Direct, Single Register-
*Register Direct, Two Registers
*IO Direct
*Data Direct
Data Indirect
Data Indirect w/Displacement
Data Indirect w/Pre-Decrement
Data Indirect w/Post-Increment
Program Memory Constant
Program Memory w/Post-Inc
*Direct Program
Indirect Program
*Relative Program
* denotes previously covered.

String Theory

Indirect addressing can be said to involve “pointers”. In the C language, the word “pointer” scares people. Hopefully we can calm these irrational fears, by coding an assortment of string routines using simple indirect addressing modes. By the end of this tutorial, we should have a good basis for a library of string functions.


The six registers, r26 through r31 can be paired together and referenced using the letters X, Y and Z. (the X register is r27:r26, the Y register is r29:r28, and the Z register is r31:r30). When combined, these registers are 16-bit “address pointers” for indirect addressing of the data space. In use, the X,Y and Z register pairs are loaded with an address of interest.

The three indirect address registers X, Y, and Z are defined as described here:


Speaking Indirectly

Previously we used the LDS instruction to load the value stored inside SRAM memory. For example, this code loads the number 42 into register r24:

  volatile uint8_t x=42;

  asm (
    "lds r24, (x) \n"

But with the X, Y and Z pointer registers, we load the SRAM address into the register pairs (not the value stored there). Hence, we use the the term “indirect addressing”. For example, the following code loads the “address” of the string, (src) into the X register pair via the constraint, “x” (src). When we want the first character of the string, or as in this case, ‘a’, we load it “indirectly” from the X register pair (address) like so:

const char src[4] = "abc";

asm (
  "ld __tmp_reg__, X \n"
  : : "x" (src)

Fetch Me Z Pointer

Here is an example directly out of the AVR Inline Assembler Cookbook involving a true C-pointer. In this code snippet, ptr is a pointer to variable number. The ‘e’ constraint requests that ptr (which is the address of variable number) be loaded into one of the X, Y or Z register pairs, at the assembler’s choice.

Then, the value at the “address” inside the pointer register pair (or 0x11) is loaded into the temporary register (__tmp_reg__). It is incremented, and finally stored back through the pointer ptr into the variable number. At the completion of this inline code, number = 0x12, and of course, the value of ptr hasn’t changed.

volatile uint8_t number=0x11, *ptr = &number;

asm volatile(
  "ld __tmp_reg__, %a0 \n"
  "inc __tmp_reg__     \n"
  "st %a0, __tmp_reg__ \n"
  : : "e" (ptr) : "memory"

If you have don’t have a good grasp of C pointers, this could be slightly confusing. It might be helpful to examine the assembler code produced to see exactly what is happening here (note the compiler selected the Z register pair for the pointer, ptr):

0000029E e0.91.00.01   LDS R30, 0x0100 //load address into ptr (0x0102)
000002A0 f0.91.01.01   LDS R31, 0x0101
000002A2 00.80         LDD R0, Z+0     //load number into r0 (0x11)
000002A3 03.94         INC R0          //increment r0 to 0x12
000002A4 00.82         STD Z+0, R0     //store back into number (0x0102)

Address locations:
ptr:	0x0102	uint8_t* @0x0100
p:	0x11	uint8_t  @0x0102 

How Long is a String?

Now, onto strings. The following code calculates the length of the string str, not including the terminating NUL, or ‘\0’ character. It places the number of characters inside str into len:

const char src[4] = "abc";
volatile uint8_t len;

asm (
  "_loop:               \n"
  "ld   __tmp_reg__, Z+ \n"
  "tst  __tmp_reg__     \n"
  "brne _loop           \n"
  //Z points one character past the terminating NUL
  "subi %A1, 1          \n" //subtract post-increment
  "sbci %B1, 0          \n"
  "sub  %A1, %A2        \n" //length = end - start
  "sbc  %B1, %B2        \n"
  "mov  %0, %A1         \n" //save len (uint8_t)
  : "=r" (len) : "z" (src), "x" (src) : "memory"

First, notice we define input constraints for the string (str) twice, using both X and Z pairs. These constraints place the address of the string inside of the r30:r31 and r26:r27 register pairs. The reason for this will become clear in a moment.

Studying the code further, notice we load the first character of the string (pointed to by the “Z” register), placing it into the temp register (__tmp_reg__). Further, take note that the instruction has a plus sign ‘+‘ appended to the ‘Z’. This means the Z register is incremented by 1 after the load operation. It’s as if we combine two instructions into one! This is termed “Indirect Addressing with Post-Increment”.

Next, the temp register is tested (tst __tmp_reg__), and if it is NOT zero, execution will loop back and fetch another character. This repeats until finding the NUL character at the end of the string. This terminates the loop, however at this point, because of the post-increment operation, the Z register points one location past the end of the string.

We complete the routine by subtracting 1 for extra post-increment, and then subtract the ending string address from the start address. The result of this math is the length of the string.

Here is a slightly more efficient version, but I will leave it to you to determine the details of the shortened arithmetic (the embedded comment explains the math in cryptic fashion):

const char src[4] = "abc";
volatile uint8_t len;

asm (
  "_loop:              \n"
  "ld  __tmp_reg__, Z+ \n"
  "tst __tmp_reg__     \n"
  "brne _loop          \n"
  //len=Z - 1 – src = (-1 - src) + Z = ~src + Z
  "com %A2             \n"
  "com %B2             \n"
  "add %A2, %A1        \n"
  "adc %B2, %B1        \n"
  : "=r" (len) : "z" (src), "x" (src)

Zerox a String

Lets do another one. This code copies the src string (including the terminating NUL character) to the array pointed to by dst. However, the strings may not overlap, and the dst string must be large enough to receive the copy. If the destination string is not large enough, anything could happen…

const char src[4] = "abc";
char dst[4] = "   ";

asm (
  "_copy:               \n"
  "ld   __tmp_reg__, Z+ \n"  //load tmp reg w/src char
  "st   X+, __tmp_reg__ \n" //store tmp reg to dst 
  "tst  __tmp_reg__     \n" //check if 0 (end)
  "brne _copy           \n"
  : : "x" (dst) , "z" (src)

Wow, only 4-lines of inline assembly code can copy a string! As you can see, this is very straight forward and quite simple. It utilizes the X and Z register pairs, incorporating post-increment addressing with both.

Who’s String is Bigger?

This code compares the two strings s1 and s2. It returns an integer (in result) less than, equal to, or greater than zero if s1 is found to be less than, to match, or be greater than s2. Again, it utilizes the X and Z register pairs, incorporating post-increment addressing with both. Hopefully you are starting to recognize the power of “indirect addressing” combined with “post-indexing”.

char s1[4] = "abc";
char s2[4] = "xyz";
volatile int16_t result;

asm (
  "_compare:                     \n"
  "ld   %A0, X+                  \n"
  "ld   __tmp_reg__, Z+          \n"
  "sub  %A0, __tmp_reg__         \n"
  "cpse __tmp_reg__, __zero_reg__\n"
  "breq _compare                 \n"
  "sbc  %B0, %B0                  \n"
  : "=&r" (result) : "x" (s1) , "z" (s2)

String Cat

This code appends the src string to the dst string overwriting the NUL character at the end of dst, and then adds a terminating NUL character. The strings may not overlap, and the dst string must have enough space for the result. This example is slightly more involved, but with a little study the details should become clear.

const char src[4] = "def";
char dst[7] = "abc";

asm (
  "_dst:                \n" //find end of destination
  "ld   __tmp_reg__, X+ \n"
  "tst  __tmp_reg__     \n"
  "brne _dst            \n"
  "sbiw %A0, 1          \n" //undo post-increment
  "_src:                \n" //X==end of dst string
  "ld   __tmp_reg__, Z+ \n"  //copy src to dst
  "st   X+, __tmp_reg__ \n"
  "tst  __tmp_reg__     \n" //test for 0 (end)
  "brne _src            \n"
  : : "x" (dst), "z" (src) : "memory"

Charred String?

Finally, this code finds the first occurrence of the character val in the string src. Here “character” means “byte” (no wide or multi-byte characters allowed). The location of the matched character is placed in a pointer (c) or a NUL if the character is not found.

const char s[4] = "abc", *c;
volatile int16_t val = 0x63;

asm (
  "_loop:        \n"
  "ld   %A0, Z+  \n" //fetch char from string
  "cp   %A0, %A2 \n" //compare char with val
  "breq _found   \n"
  "tst  %A0      \n" //end of string (0)?
  "brne _loop    \n" //not at end
  "clr  %B0      \n" //not found, NULL pointer
  "rjmp _end     \n"
  "_found:       \n"
  "sbiw %A1, 1   \n" //undo post-increment
  "movw %A0, %A1 \n" //save pointer
  "_end:         \n"
  : "=x" (c) : "z" (s), "r" (val)


C Programming and Strings
Further information on addressing modes can be found in Section 2 of the AVR Instruction Set Manual
AVR 8-bit Instruction Set
AVR-GCC Inline Assembler Cookbook
Extended Asm – Assembler Instructions with C Expression Operands
AVRLibc String Functions

Also available as a book, with greatly expanded coverage!

[click on the image]

Posted in Uncategorized | Tagged , , , , , | Leave a comment