LPC81x ARM Cortex-M0 Basics

arm logo

ARM Cortex-M0+ Architecture Basics

Based upon Harvard Architecture, the LPC812 uses an ARM Cortex-M0+ processor. This means it has separate instruction (flash) and data (SRAM) memory. The basic architecture includes the core components and peripherals.

The core consists of:

  • Processor
  • Memories
  • GPIO
  • Pin interrupts
  • SCTimer/PWM

Peripherals:

  • USARTs
  • SPIs
  • I2C
  • ADC
  • IOCON
  • Multi-rate Timer
  • Watchdog Timer

All output is routed through the Switch Matrix to the individual pins. The switch matrix allows the flexibility of swapping the digital peripheral functions amongst the pins. Obviously, the basic functions like GPIO, power, ground and some others cannot be swapped.

LPC81x Block Diagram

block diagram

Memory Mapping

Fortunately for us, we don’t need to focus too much on the internal design. The main factor to remember is the peripherals are memory mapped. This means our interaction with them (configuration, control, input, and output) is accomplished through an address. Accessing a peripheral is just like writing or reading a value in memory.

It is good practice to access peripherals using a read-modify-write strategy. This strategy is seen throughout ARM and LPC examples.

Read-Write-Modify Example:

GPIO_DIR |= (1<<9);  //proper method preserves unaffected bits of register
//assembler translation:
         0xd6: 0x4813         LDR.N     R0, [PC, #0x4c]         ; [0x124] DIR0
         0xd8: 0x6800         LDR       R0, [R0]
         0xda: 0x2180         MOVS      R1, #128                ; 0x80
         0xdc: 0x0089         LSLS      R1, R1, #2
         0xde: 0x4301         ORRS      R1, R1, R0
         0xe0: 0x4810         LDR.N     R0, [PC, #0x40]         ; [0x124] DIR0
         0xe2: 0x6001         STR       R1, [R0]
…
        0x124: 0xa0002000     DC32      DIR0
//
//
//
GPIO_DIR = (1<<9);  //improper method clobbers all bit of the register
//assembler translation:
         0xc2: 0x2080         MOVS      R0, #128                ; 0x80
         0xc4: 0x0080         LSLS      R0, R0, #2
         0xc6: 0x4911         LDR.N     R1, [PC, #0x44]         ; [0x10c] DIR0
         0xc8: 0x6008         STR       R0, [R1]
…
        0x10c: 0xa0002000     DC32      DIR0

Flash, SRAM and ROM memory

The LPC81xM contain up to 16kB of flash program memory, a total of up to 4kB static RAM data memory, and 8kB of on-chip ROM. The ROM contains the boot loader and In-System Programming (ISP) and In-Application Programming (IAP) support for flash programming, profiles for configuring power consumption and PLL settings, USART driver API routines, and I2C-bus driver routines.

The Very Basic Memory Map

0x00000000 - 0x00004000: Flash program memory
0x10000000 - 0x10001000: SRAM memory
0x1FFF0000 - 0x1FFF2000: Boot ROM (8kB)
0x40000000 - 0x40070000: All APB peripherals
0x50004000 - 0x50008000: SCTimer/PWM
0xA0000000 - 0xA0008000: GPIO

My next post about ARM Cortex-M0.

Posted in Uncategorized | Tagged , , , , | Leave a comment

Arduino Mode0 SPI Bit Bang and Bare Metal Hardware SPI

bare metal truck

Here are two additional versions of the SPI program from my previous post. The first of these programs use a “bare-metal” version of hardware SPI. The second is a bit-bang version using different pins.

How does the speed compare between the two versions? Not even close. The hardware SPI is running at 8MHz and on average, transfers one byte in 2.438us. The bit-bang version takes about 12.56us to transfer a byte. I timed the period the CS pin is pulled low. Note, under HW SPI, the delay between CS going low and the first clock pulse is 0.8125Us, while in the bit-bang version the delay is approximately 2.125us (the y axis scales of the two screen captures are not the same).

Hardware SPI:
hardware spi

Bit Bang SPI:
bit bang spi

Bare-Metal Version:

//
//FM24CL64B SPI F-RAM
//64-Kbit
//
//using bare-metal hardware spi
//
/*
Arduino--Logic Conv--FRAM
D13------TXH/TXL-----6.SCK
D12------------------2.MISO
D11------TXH/TXL-----5.MOSI
D10------------------1.CS
3V3------LV
5V-------HV
GND------HV GND
GND------------------4.VSS
3V3------------------8.VCC
3V3------------------7.HOLD (tie to Vcc if not used)
3V3------[10KR]------1.CS
3.WP (active low, tie to Vcc if not used)
*/

#ifndef LSBFIRST
#define LSBFIRST 0
#endif
#ifndef MSBFIRST
#define MSBFIRST 1
#endif

#define CLOCK_DIV4 0x00
#define CLOCK_DIV16 0x01
#define CLOCK_DIV64 0x02
#define CLOCK_DIV128 0x03
#define CLOCK_DIV2 0x04
#define CLOCK_DIV8 0x05
#define CLOCK_DIV32 0x06
#define MODE0 0x00
#define MODE1 0x04
#define MODE2 0x08
#define MODE3 0x0C
#define MODE_MASK 0x0C // CPOL = bit 3, CPHA = bit 2 on SPCR
#define CLOCK_MASK 0x03 // SPR1 = bit 1, SPR0 = bit 0 on SPCR
#define CLOCKX2_MASK 0x01 // SPI2X = bit 0 on SPSR

//spi hardware transfer
inline static uint8_t SpiTransfer(uint8_t data) {
SPDR = data;
asm volatile("nop");
while (!(SPSR & _BV(SPIF)))
; // wait
return SPDR;
}

//SRAM opcodes
#define WREN 0b00000110 //set write enable latch
#define WRDI 0b00000100 //write disable
#define RDSR 0b00000101 //read status register
#define WRSR 0b00000001 //write status register
#define READ 0b00000011 //read memory data
#define WRITE 0b00000010 //write memory data

uint8_t SpiRAMRead8(uint16_t address) {
uint8_t read_byte;

PORTB &= ~(1<>8)&0xff));
SpiTransfer((char)address);
read_byte = SpiTransfer(0xff);
PORTB |= (1<<PORTB2); //set CS high
return read_byte;
}

void SpiRAMWrite8(uint16_t address, uint8_t data) {
PORTB &= ~(1<<PORTB2); //set CS low
SpiTransfer(WREN);
PORTB |= (1<<PORTB2); //set CS high
PORTB &= ~(1<>8)&0xff));
SpiTransfer((char)address);
SpiTransfer(data);
PORTB |= (1<<PORTB2); //set CS high
}

void setup(void) {
uint16_t addr;
uint8_t i, sreg;

Serial.begin(9600);
sreg = SREG;
noInterrupts();

//pin setup
pinMode(10, OUTPUT); //CS
pinMode(11, OUTPUT); //MOSI
pinMode(12, INPUT); //MISO
pinMode(13, OUTPUT); //SCK
PORTB |= (1<> 2) & CLOCKX2_MASK);

//test it
for (addr=0; addr<4; addr++) {
SpiRAMWrite8(addr, (uint8_t)addr);
Serial.print("Addr: ");
Serial.print(addr);
i = SpiRAMRead8(addr);
Serial.print(" | Read: ");
Serial.println((uint16_t)i);
}
}

void loop() { }

Bit Bang Version:

//
//FM24CL64B SPI F-RAM
//64-Kbit
//
//bit-bang
//
/*
Arduino--Logic Conv--FRAM
D7-------TXH/TXL-----6.SCK
D6-------------------2.MISO
D5-------TXH/TXL-----5.MOSI
D4-------------------1.CS
3V3------LV
5V-------HV
GND------HV GND
GND------------------4.VSS
3V3------------------8.VCC
3V3------------------7.HOLD (tie to Vcc if not used)
3V3------[10KR]------1.CS
3.WP (active low, tie to Vcc if not used)
*/

//bitbang
uint8_t SpiTransfer(uint8_t _data) {
for (uint8_t bit=0; bit<8; bit++) {
if (_data & 0x80) //set/clear mosi bit
PORTD |= (1<<PORTD5);
else
PORTD &= ~(1<<PORTD5);
_data <<= 1; //shift for next bit

if (PIND) //capture miso bit
_data |= (PIND & (1<<PORTD6)) != 0;

PORTD |= (1<<PORTD7); //pulse clock
asm volatile ("nop \n\t"); //pause
PORTD &= ~(1<<PORTD7);
asm volatile ("nop \n\t"); //pause
}
return _data;
}

//SRAM opcodes
#define WREN 0b00000110 //set write enable latch
#define WRDI 0b00000100 //write disable
#define RDSR 0b00000101 //read status register
#define WRSR 0b00000001 //write status register
#define READ 0b00000011 //read memory data
#define WRITE 0b00000010 //write memory data

uint8_t SpiRAMRead8(uint16_t address) {
uint8_t read_byte;

PORTD &= ~(1<>8)&0xff));
SpiTransfer((char)address);
read_byte = SpiTransfer(0xff);
PORTD |= (1<<PORTD4); //set CS high
return read_byte;
}

void SpiRAMWrite8(uint16_t address, uint8_t data) {
PORTD &= ~(1<<PORTD4); //set CS low
SpiTransfer(WREN);
PORTD |= (1<<PORTD4); //set CS high
PORTD &= ~(1<>8)&0xff));
SpiTransfer((char)address);
SpiTransfer(data);
PORTD |= (1<<PORTD4); //set CS high
}

void setup(void) {
uint16_t addr;
uint8_t i, sreg;

Serial.begin(9600);
//configure pins
pinMode(4, OUTPUT); //CS
pinMode(5, OUTPUT); //MOSI
pinMode(6, INPUT); //MISO
pinMode(7, OUTPUT); //SCK
PORTD |= (1<<PORTD4); //set CS high
PORTD &= ~_BV(PORTD7); //set clock low

//test it
for (addr=0; addr<32; addr++) {
SpiRAMWrite8(addr, (uint8_t)addr);
Serial.print("Addr: ");
Serial.print(addr);
i = SpiRAMRead8(addr);
Serial.print(" | Read: ");
Serial.println((uint16_t)i);
}
}

void loop() { }

Posted in Uncategorized | Tagged , , , | Leave a comment

Arduino and Cypress SPI FM25CL64B FRAM

lightning
The FM25CL64B is a 64K-bit ferroelectric RAM (F-RAM or FRAM) memory chip. Unlike typical flash and EEPROM memory, FRAM is capable of performing write operations at bus speed. According to the Cypress datasheet, this FRAM chip is capable of being clocked at up to 40MHz. I purchased a few SOIC-8 (150mils) chips for testing. I soldered the chip to a dipmicro SMT SOIC-to-DIP adapter PCB and kludged together a simple test program for my arduino:

FM25CL64Ba

FM25CL64Bb

Since the chip is not 5V tolerant, I used a SparkFun 12009 level shifter to perform the 5V to 3V3 logic conversions. Here is how I connect the arduino, FM25CL64B and level shifter:

Arduino--Logic Conv--FRAM
D13------TXH/TXL-----6.SCK
D12------------------2.MISO
D11------TXH/TXL-----5.MOSI
D10------------------1.CS
3V3------LV
5V-------HV
GND------HV GND
GND------------------4.VSS
3V3------------------8.VCC
3V3------------------7.HOLD (tie to Vcc if not used)
3V3------[10KR]------1.CS
                     3.WP (active low, tie to Vcc if not used)

Arduino program:

//
//FM24CL64B SPI F-RAM
//64-Kbit simple test
//
#include <SPI.h>

//SRAM opcodes
#define WREN  0b00000110 //set write enable latch
#define WRDI  0b00000100 //write disable
#define RDSR  0b00000101 //read status register
#define WRSR  0b00000001 //write status register
#define READ  0b00000011 //read memory data
#define WRITE 0b00000010 //write memory data
 
uint8_t SpiRAMRead8(uint16_t address) {
  uint8_t read_byte;
 
  PORTB &= ~(1<<PORTB2);              //set CS low
  SPI.transfer(READ);
  //13-bit address MSB, LSB
  SPI.transfer((char)((address>>8)&0xff));
  SPI.transfer((char)address);
  read_byte = SPI.transfer(0xFF);
  PORTB |= (1<<PORTB2);               //set CS high
  return read_byte;
}
 
void SpiRAMWrite8(uint16_t address, uint8_t data_byte) {
  PORTB &= ~(1<<PORTB2);              //set CS low
  SPI.transfer(WREN);
  PORTB |= (1<<PORTB2);               //set CS high
  PORTB &= ~(1<<PORTB2);              //set CS low
  SPI.transfer(WRITE);
  //13-bit address MSB, LSB
  SPI.transfer((char)((address>>8)&0xff));
  SPI.transfer((char)address);
  SPI.transfer(data_byte);
  PORTB |= (1<<PORTB2);               //set CS high
}
 
void setup(void) {
  uint16_t addr;
  uint8_t i;

  Serial.begin(9600);
  pinMode(10, OUTPUT);                //CS
  pinMode(11, OUTPUT);                //MOSI 
  pinMode(12, INPUT);                 //MISO
  pinMode(13, OUTPUT);                //SCK
  PORTB |= (1<<PORTB2);               //set CS high
  SPI.begin();
  SPI.setDataMode(SPI_MODE0);
  SPI.setBitOrder(MSBFIRST);
  SPI.setClockDivider (SPI_CLOCK_DIV2);
  for (addr=0; addr<32; addr++) {
    SpiRAMWrite8(addr, (uint8_t)addr);
    Serial.print("Addr: ");
    Serial.print(addr);
    i = SpiRAMRead8(addr);
    Serial.print(" | Read: ");
    Serial.println((uint16_t)i);
  }
}
 
void loop() { }

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

Convert an ASCII String to Fixed Point: atofp()

conversion

Here is a small utility routine which converts an ASCII string floating point number into an s16.15 format fixed point number. Most fixed point libraries neglect this conversion. However, in practice, this routine is very useful. If the conversion process is not efficient, the gains of using fixed point over floating point math can be eliminated. Having said that, this is not pretty code and neither is it efficient. And it breaks a few coding rules too.

It is also important to note, the routine does very little (almost no) validity testing of the input values (size of integer/fixed point numbers, valid characters, sufficient string space, etc.). So there is plenty of opportunity here for spectacular failure.

The complementary conversion, fptoa() is also included.

//atol function ignores sign
int32_t _atol(const char* s) {
  int32_t v=0;
  
  while (*s == ' ' || (uint16_t)(*s - 9) < 5u) {
    ++s;
  }
  if (*s == '-' || *s == '+') {
    ++s;
  }
  while ((uint16_t)(*s - '0') < 10u) {
    v = v*10 + *s - '0';
    ++s;
  }
  return v;
}

#define MAX_STRING_SIZE 8

//basic string copy
static inline void _strcpy(char *d, const char *s) {
  uint8_t n=0;
  
  while (*s != '\0') {
    if (n++ >= MAX_STRING_SIZE) {
      //destination max size
      return;
    }
    *d++ = *s++;
  }
}

//basic string concatenation
void _concat(char *d, char *s) {
  uint8_t n=0;
  
  while(*d) {
    d++;
  }
  while(*s && n<MAX_STRING_SIZE) {
    *d++ = *s++;
    n++;
  }
  *d = '\0';
}

//int32_t atofp(char *)
int32_t FP_StrToFix(char *s) {
  int32_t f, fpw, fpf, bit, r[15] = {
    0x2faf080, 0x17d7840, 0xbebc20, 0x5f5e10, 0x02faf08, 0x017d784, 0x0bebc2, 0x05f5e1,
    0x002faf1, 0x0017d78, 0x00bebc, 0x005f5e, 0x0002faf, 0x00017d8, 0x000bec //0x0005f6
  };
  uint8_t sign, i;
  char *p=s, temp[9] = "00000000";

  sign = 0;
  //separate whole & fraction portions
  while (*p != '.') {
    //check for negative sign
    if (*p == '-') {
      sign = 1;
    }
    if (*p == '\0') {
      //no decimal found, return integer as fixed point
      return sign ? -(_atol(s)<<FP_FBITS) : (_atol(s)<<FP_FBITS);
    }
    p++;
  }

  //whole part
  *p = '\0';
  fpw = (_atol(s)<<FP_FBITS);

  //pad fraction part with trailing zeros
  _strcpy(temp, (p + 1));
  //get fraction
  f = _atol(temp);
  //re-insert decimal point
  *p = '.';

  fpf = 0;
  bit = 0x4000;
  //convert base10 fraction to fixed point base2
  for (i=0; i<15; i++) {
    if (f - r[i] > 0) {
      f -= r[i];
      fpf += bit;
    }
    bit >>= 1;
  }

  //join fixed point whole and fractional parts
  return sign ? -(fpw + fpf) : (fpw + fpf);
}

//void fptoa(int32_t, char *)
void FP_FixToStr(int32_t f, char *s) {
  int32_t fp, bit=0x4000, r[16] = { 50000, 25000, 12500, 6250, 3125, 1563, 781, 391, 195, 98, 49, 24, 12, 6, 3 };
  int32_t d[5] = { 10000, 1000, 100, 10 };
  char *p=s, *sf, temp[12];
  uint8_t i;
  
  //get whole part
  fp = ktoi(f);
  if (fp == 0) {
    *p = '0';
    } else {
      p = ltoa(fp, s, 10);
  }

  //get fractional part
  fp = FP_FRAC_PART(f);
  if (fp == 0) {
    return;
  }
  //iterate to end of string
  while (*p != '\0') p++;
  *p++ = '.'; //add decimal to end of s
  *p = '\0';  //terminate string
  
  f = 0;
  //convert fraction base 2 to base 10
  for (i=0; i<15; i++) {
    if (fp & bit) {
      f += r[i];
    }
    bit >>= 1;
  }
  //temporary string storage space
  sf = temp;
  sf = ltoa(f, sf, 10);
  
  // if needed, add leading zeros to fractional portion
  for (i=0; i<4; i++) {
    if (f < d[i]) {
      *p++ = '0';
      *p = '\0';
    } else {
      break;
    }
  }
  
  //combine whole & fractional parts
  _concat(s, sf);
}
Posted in Uncategorized | Tagged , , , , | 2 Comments

Arduino s16.15 Fixed Point Math Routines

It is important to note, that fixed point math comes in many flavors. For example, a 16-bit integer can implement 31 different fixed point formats (signed and unsigned Q1 through Q16). A couple of popular 16-bit formats being Q8.8, Q16 and s15. Each version has distinct ranges for the numbers the format can represent. Additionally, each type will have significant differences in the precision that can be achieved. So, when one presents fixed-point functions or a library, there is a good chance the implementation is unique and specific to the programmer’s task. It’s highly doubtful you’ll find a single-solution that fits all purposes. Having said all that, here are some fixed point functions for the arduino that I cobbled together.

Most of this was copied from the source code of avrfix, fixedptc and the AVR GCC library sources. Very little of this is new, I simply massaged a few bytes here and there and combined it all together. I changed a few function names, but left enough hints inside the assembler routines if you want to discover where they came from.

This should provide very rudimentary fixed point math, allowing a base to build upon. It uses a signed long (int32_t) in the form of a s16.15 fixed point value. There is no support for overflow/saturation. I’ve added basic square root and trig functions.

I hope I left all of the necessary attributions in the file. I’ve conducted very minimal testing, and you need to determine if the accuracy fits your needs. Use at your own risk!

fix.h:

#ifndef FIX_H
#define FIX_H

//Pragmas
#define FP_IBITS       16 //integer bits
#define FP_FBITS       15 //fraction bits
#define FP_BITS        32 //total bits (s16.15)
#define FP_MIN         -2147450880L
#define FP_MAX         2147450880L
#define FP_FMASK       (((int32_t)1<<FP_FBITS) - 1)
#define FP_ONE         ((int32_t)0x8000)
#define FP_CONST(R)   ((int32_t)((R)*FP_ONE + ((R) >= 0 ? 0.5 : -0.5)))
#define FP_PI          FP_CONST(3.14159265358979323846)
#define FP_TWO_PI      FP_CONST(2*3.14159265358979323846)
#define FP_HALF_PI     FP_CONST(3.14159265358979323846/2)
#define FP_ABS(A)      ((A) < 0 ? -(A) : (A))
#define FP_FRAC_PART(A) ((int32_t)(A)&FP_FMASK)
#define FP_DegToRad(D) (FP_Division(D, (int32_t)1877468))
#define FP_RadToDeg(R) (FP_Multiply(R, (int32_t)18529868))

//basic math
extern int32_t FP_Multiply(int32_t, int32_t);
extern int32_t FP_Division(int32_t, int32_t);

//special functions 
int32_t FP_Round(int32_t, uint8_t);

//conversion Functions
extern float FP_FixedToFloat(int32_t);
extern int32_t FP_FloatToFixed(float);
#define itok(i)        ((int32_t)((int32_t)i<<(int32_t)15))
#define ktoi(k)        ((int16_t)((int32_t)k>>(int32_t)15))
#define ftok(f)        ((int32_t)(float)((f)*(32768)))

//square root
extern int32_t _FP_SquareRoot(int32_t, int32_t);
#define FP_Sqrt(a)     _FP_SquareRoot(a, 15);

//trig
extern int32_t FP_Sin(int32_t);
#define FP_Cos(A)      (FP_Sin(FP_HALF_PI - A))
#define FP_Tan(A)      (FP_Division(FP_Sin(A), FP_Cos(A)))

#endif //FIX_H 

fix.c:

/*
 * The ideas and algorithms have been cherry-picked from a large number
 * of previous implementations available on the Internet, and from the
 * AVR GCC lib sources.
 * Copyright (c) 2002  Michael Stumpf  <mistumpf@de.pepperl-fuchs.com>
 * Copyright (c) 2006  Dmitry Xmelkov
 * Copyright (C) 2012-2015 Free Software Foundation, Inc.
 * Contributed by Sean D'Epagnier  (sean@depagnier.com)
 * Georg-Johann Lay (avr@gjlay.de)
 * All rights reserved.
 * Maximilan Rosenblattl, Andreas Wolf 2007-02-07
 * Copyright (c) 2010-2012 Ivan Voras <ivoras@freebsd.org>
 * Copyright (c) 2012 Tim Hartrick <tim@edgecast.com>
 
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:

 * Redistributions of source code must retain the above copyright
 notice, this list of conditions and the following disclaimer.
 * Redistributions in binary form must reproduce the above copyright
 notice, this list of conditions and the following disclaimer in
 the documentation and/or other materials provided with the
 distribution.
 * Neither the name of the copyright holders nor the names of
 contributors may be used to endorse or promote products derived
 from this software without specific prior written permission.

 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 POSSIBILITY OF SUCH DAMAGE.
*/
#include <avr/io.h>
#include "fix.h"

//__mulsa3
int32_t __attribute__((naked)) FP_Multiply(int32_t a, int32_t b) {
  asm volatile (
  "movw  r16, r18 \n\t"
  "movw  r18, r20 \n\t"
  "movw  r20, r22 \n\t"
  "movw  r22, r24 \n\t"
  "clt            \n\t"
  //__mulusa3_round
  "clr  r24       \n\t"
  "clr  r25       \n\t"
  "mul  r16, r20  \n\t"
  "movw r26, r0   \n\t"
  "mul  r17, r20  \n\t"
  "add  r27, r0   \n\t"
  "adc  r24, r1   \n\t"
  "mul  r16, r21  \n\t"
  "add  r27, r0   \n\t"
  "adc  r24, r1   \n\t"
  "rol  r25       \n\t"
  "brtc 0f        \n\t"
  "sbrc r27, 0x07 \n\t"
  "adiw r24, 0x01 \n\t"
  "0: push r27    \n\t"
  "mul  R16, R22  \n\t"
  "add  R24, r0   \n\t"
  "adc  R25, r1   \n\t"
  "sbc  R26, R26  \n\t"
  "mul  R17, R21  \n\t"
  "add  R24, r0   \n\t"
  "adc  R25, r1   \n\t"
  "sbci R26, 0x00 \n\t"
  "mul  R18, R20  \n\t"
  "add  R24, r0   \n\t"
  "adc  R25, r1   \n\t"
  "sbci R26, 0x00 \n\t"
  "neg  R26       \n\t"
  "mul  R16, R23  \n\t"
  "add  R25, r0   \n\t"
  "adc  R26, r1   \n\t"
  "sbc  R27, R27  \n\t"
  "mul  R17, R22  \n\t"
  "add  R25, r0   \n\t"
  "adc  R26, r1   \n\t"
  "sbci R27, 0x00 \n\t"
  "mul  R18, R21  \n\t"
  "add  R25, r0   \n\t"
  "adc  R26, r1   \n\t"
  "sbci R27, 0x00 \n\t"
  "mul  R19, R20  \n\t"
  "add  R25, r0   \n\t"
  "adc  R26, r1   \n\t"
  "sbci R27, 0x00 \n\t"
  "neg  R27       \n\t"
  "mul  R17, R23  \n\t"
  "add  R26, r0   \n\t"
  "adc  R27, r1   \n\t"
  "mul  R18, R22  \n\t"
  "add  R26, r0   \n\t"
  "adc  R27, r1   \n\t"
  "mul  R19, R21  \n\t"
  "add  R26, r0   \n\t"
  "adc  R27, r1   \n\t"
  "mul  R18, R23  \n\t"
  "add  R27, r0   \n\t"
  "mul  R19, R22  \n\t"
  "add  R27, r0   \n\t"
  "pop  r0        \n\t"
  "clr  r1        \n\t"
  "tst  r23       \n\t"
  "brpl 1f        \n\t"
  "sub  r26, r16  \n\t"
  "sbc  r27, r17  \n\t"
  "1: sbrs r19, 0x07 \n\t"
  "rjmp 2f        \n\t"
  "sub  r26, r20  \n\t"
  "sbc  r27, r21  \n\t"
  "2: lsl  r0     \n\t"
  "rol  r24       \n\t"
  "rol  r25       \n\t"
  "rol  r26       \n\t"
  "rol  r27       \n\t"
  "lsl  r0        \n\t"
  "adc  r24, r1   \n\t"
  "adc  r25, r1   \n\t"
  "adc  r26, r1   \n\t"
  "adc  r27, r1   \n\t"
  "movw r22, r24  \n\t"
  "movw r24, r26  \n\t"
  "ret            \n\t"
  );
}

//__divsa3
int32_t __attribute__((naked)) FP_Division(int32_t a, int32_t b) {
  asm volatile (
  "movw  r26, r24 \n\t"
  "movw  r24, r22 \n\t"
  "mov  r0, r27   \n\t"
  "eor  r0, r21   \n\t"
  "sbrs r21, 0x07 \n\t"
  "rjmp 1f        \n\t"
  //NEG4 r18
  "com  R21       \n\t"
  "com  R20       \n\t"
  "com  R19       \n\t"
  "neg  R18       \n\t"
  "sbci R19, 0xFF \n\t"
  "sbci R20, 0xFF \n\t"
  "sbci R21, 0xFF \n\t"
  "1: sbrs r27, 0x07 \n\t"
  "rjmp    2f     \n\t"
  //NEG4 r24
  "com  R27       \n\t"
  "com  R26       \n\t"
  "com  R25       \n\t"
  "neg  R24       \n\t"
  "sbci R25, 0xFF \n\t"
  "sbci R26, 0xFF \n\t"
  "sbci R27, 0xFF \n\t"
  //__udivusa3
  "2: ldi r30, 0x20 \n\t"
  "mov  r1, r30   \n\t"
  "clr  r30       \n\t"
  "clr  r31       \n\t"
  "movw r22, r30  \n\t"
  "lsl  r24       \n\t"
  "rol  r25       \n\t"
  "udivusa3_loop: rol r26 \n\t"
  "rol  r27       \n\t"
  "rol  r30       \n\t"
  "rol  r31       \n\t"
  "brcs udivusa3_ep \n\t"
  "cp   r26, r18  \n\t"
  "cpc  r27, r19  \n\t"
  "cpc  r30, r20  \n\t"
  "cpc  r31, r21  \n\t"
  "brcc udivusa3_ep \n\t"
  "rol  r22       \n\t"
  "rjmp udivusa3_cont \n\t"
  "udivusa3_ep: sub  r26, r18 \n\t"
  "sbc  r27, r19  \n\t"
  "sbc  r30, r20  \n\t"
  "sbc  r31, r21  \n\t"
  "lsl  r22       \n\t"
  "udivusa3_cont: rol r23 \n\t"
  "rol  r24       \n\t"
  "rol  r25       \n\t"
  "dec  r1        \n\t"
  "brne udivusa3_loop \n\t"
  "com  r22       \n\t"
  "com  r23       \n\t"
  "com  r24       \n\t"
  "com  r25       \n\t"
  //"ret            \n\t"
  "lsr  r25       \n\t"
  "ror  r24       \n\t"
  "ror  r23       \n\t"
  "ror  r22       \n\t"
  "sbrs r0, 0x07  \n\t"
  "ret            \n\t"
  //negate r22
  //XJMP __negsi2
  );
}

//Difference from ISO/IEC DTR 18037: using an uint8_t as second parameter according to microcontroller register size and maximum possible value
int32_t FP_Round(int32_t f, uint8_t n) {
  n = FP_FBITS - n;
  if (f >= 0)
    return (f&(0xFFFFFFFF<<n)) + ((f&(1<<(n - 1)))<<1);
  else
    return (f&(0xFFFFFFFF<<n)) - ((f&(1<<(n - 1)))<<1);
}

//__fractsfsa
int32_t __attribute__((naked)) FP_FloatToFixed(float f) {
  asm volatile (
  "subi r24, 0x80   \n\t"
  "sbci r25, 0xf8   \n\t"
  "jmp fixsfsi      \n\t"
  "fixsfsi:         \n\t"
  "rcall fixunssfi  \n\t"
  "set              \n\t"
  "cpse r27, r1     \n\t"
  "rjmp fp_zero     \n\t"
  "ret              \n\t"
  "fixunssfi:       \n\t"
  "rcall fp_splitA  \n\t"
  "brcs 7f          \n\t"
  "subi r25, 127    \n\t"
  "brlo 8f          \n\t"
  "mov  r27, r25    \n\t"
  "clr  r25         \n\t"
  "subi r27, 23     \n\t"
  "brlo 4f          \n\t"
  "breq 9f          \n\t"
  "1: lsl r22       \n\t"
  "rol  r23         \n\t"
  "rol  r24         \n\t"
  "rol  r25         \n\t"
  "brmi 2f          \n\t"
  "dec  r27         \n\t"
  "brne 1b          \n\t"
  "rjmp 9f          \n\t"
  "2: cpi r27, 0x01 \n\t"
  "breq 9f          \n\t"
  "7: rcall fp_zero \n\t"
  "ldi  r27, 1      \n\t"
  "ret              \n\t"
  "8: rjmp fp_zero  \n\t"
  "3: mov r22, r23  \n\t"
  "mov  r23, r24    \n\t"
  "clr  r24         \n\t"
  "subi r27, -8     \n\t"
  "breq 9f          \n\t"
  "4: cpi r27, -7   \n\t"
  "brlt 3b          \n\t"
  "5: lsr r24       \n\t"
  "ror  r23         \n\t"
  "ror  r22         \n\t"
  "inc  r27         \n\t"
  "brne 5b          \n\t"
  "9: brtc 6f       \n\t"
  "com  r25         \n\t"
  "com  r24         \n\t"
  "com  r23         \n\t"
  "neg  r22         \n\t"
  "sbci r23, -1     \n\t"
  "sbci r24, -1     \n\t"
  "sbci r25, -1     \n\t"
  "6: ret           \n\t"
  "fp_split3:       \n\t"
  "sbrc r21, 0x07   \n\t"
  "subi r25, 0x80   \n\t"
  "lsl  r20         \n\t"
  "rol  r21         \n\t"
  "breq 14f         \n\t"
  "cpi  r21, 0xff   \n\t"
  "breq 15f         \n\t"
  "11: ror  r20     \n\t"
  "fp_splitA:       \n\t"
  "lsl  r24         \n\t"
  "12: bst r25, 0x07 \n\t"
  "rol  r25         \n\t"
  "breq 16f         \n\t"
  "cpi  r25, 0xff   \n\t"
  "breq 17f         \n\t"
  "13: ror r24      \n\t"
  "ret              \n\t"
  "14: cp r1, r18   \n\t"
  "cpc  r1, r19     \n\t"
  "cpc  r1, r20     \n\t"
  "rol  r21         \n\t"
  "rjmp 11b         \n\t"
  "15: lsr  r20     \n\t"
  "rcall fp_splitA  \n\t"
  "rjmp 18f         \n\t"
  "16: cp r1, r22   \n\t"
  "cpc  r1, r23     \n\t"
  "cpc  r1, r24     \n\t"
  "rol  r25         \n\t"
  "rjmp 13b         \n\t"
  "17: lsr r24      \n\t"
  "cpc  r23, r1     \n\t"
  "cpc  r22, r1     \n\t"
  "18: sec          \n\t"
  "ret              \n\t"
  "fp_zero:         \n\t"
  "clt              \n\t"
  "clr  r27         \n\t"
  "clr  r22         \n\t"
  "clr  r23         \n\t"
  "movw r24, r22    \n\t"
  "bld  r25, 0x07   \n\t"
  "ret              \n\t"
  );
}

//__fractsasf
float __attribute__((naked)) FP_FixedToFloat(int32_t k) {
  asm volatile (
  //__floatsisf:
  //"clt               \n\t"
  //"rjmp 1f           \n\t"
  "bst  r25, 0x07    \n\t"
  "brtc 1f           \n\t"
  "com  r25          \n\t"
  "com  r24          \n\t"
  "com  r23          \n\t"
  "neg  r22          \n\t"
  "sbci r23, -1      \n\t"
  "sbci r24, -1      \n\t"
  "sbci r25, -1      \n\t"
  "1: tst  r25       \n\t"
  "breq 4f           \n\t"
  "mov  r31, r25     \n\t"
  "ldi  r25, 127 + 23 \n\t"
  "clr  r27          \n\t"
  "2: inc  r25       \n\t"
  "lsr  r31          \n\t"
  "ror  r24          \n\t"
  "ror  r23          \n\t"
  "ror  r22          \n\t"
  "ror  r27          \n\t"
  "cpse r31, r1      \n\t"
  "rjmp  2b          \n\t"
  "brpl 11f          \n\t"
  "lsl  r27          \n\t"
  "brne 3f           \n\t"
  "sbrs  r22, 0x00   \n\t"
  "rjmp  11f         \n\t"
  "3: subi  r22, -1  \n\t"
  "sbci  r23, -1     \n\t"
  "sbci  r24, -1     \n\t"
  "sbci  r25, -1     \n\t"
  "rjmp  11f         \n\t"
  "4: tst  r24       \n\t"
  "breq 5f           \n\t"
  "ldi  r25, 127 + 23 \n\t"
  "rjmp 8f           \n\t"
  "5: tst  r23       \n\t"
  "breq 6f           \n\t"
  "ldi  r25, 127 + 15 \n\t"
  "mov  r24, r23     \n\t"
  "mov  r23, r22     \n\t"
  "rjmp 7f           \n\t"
  "6: tst  r22       \n\t"
  "breq 9f           \n\t"
  "ldi  r25, 127 + 7 \n\t"
  "mov  r24, r22     \n\t"
  "ldi  r23, 0x00    \n\t"
  "7: ldi  r22, 0x00 \n\t"
  "brmi 11f          \n\t"
  "10: dec  r25      \n\t"
  "lsl  r22          \n\t"
  "rol  r23          \n\t"
  "rol  r24          \n\t"
  "8: brpl 10b       \n\t"
  "11: lsl  r24      \n\t"
  "lsr  r25          \n\t"
  "ror  r24          \n\t"
  "bld  r25, 0x07    \n\t"
  "9: tst r25        \n\t"
  "breq 10f          \n\t"
  "subi r24, 0x80    \n\t"
  "sbci r25, 0x07    \n\t"
  "10: ret           \n\t"
  );
}

int32_t FP_Sin(int32_t fp) {
  int16_t sign = 1;
  int32_t sqr, result;
  const int32_t SK[2] = {
    FP_CONST(7.61e-03),
    FP_CONST(1.6605e-01)
  };

  //normalize
  fp %= 2*FP_PI;
  if (fp < 0)
    fp = FP_TWO_PI + fp;
    //fp = FP_PI*2 + fp;
  if ((fp > FP_HALF_PI) && (fp <= FP_PI))
    fp = FP_PI - fp;
  else if ((fp > FP_PI) && (fp <= (FP_PI + FP_HALF_PI))) {
    fp = fp - FP_PI;
    sign = -1;
  } else if (fp > (FP_PI + FP_HALF_PI)) {
    fp = (FP_PI<<1) - fp;
    sign = -1;
  }
  
  //calculate sine
  sqr = FP_Multiply(fp, fp);
  result = FP_Multiply(SK[0], sqr);
  result = FP_Multiply((result - SK[1]), sqr);
  result = FP_Multiply((result + FP_ONE), fp);
/*
  //taylor series
  // sin(x) = x − (x^3)/3! + (x^5)/5! − (x^7)/7! + ...
  sqr = FP_Multiply(fp, fp);
  fp = FP_Multiply(fp, sqr);
  result -= FP_Division(fp, itok(6));
  fp = FP_Multiply(fp, sqr);
  result += FP_Division(fp, itok(120));
  fp = FP_Multiply(fp, sqr);
  result -= FP_Division(fp, itok(5040));
  fp = FP_Multiply(fp, sqr);
  result += FP_Division(fp, itok(362880));
  fp = FP_Multiply(fp, sqr);
  result -= FP_Division(fp, itok(39916800));
*/
  return (sign*result);
}

#define _SqrtStep(shift)                  \
if ((0x40000001>>shift) + sval <= val) {  \
  val -= (0x40000001>>shift) + sval;      \
  sval = (sval>>1) | (0x40000001>>shift); \
  } else {                                \
  sval = sval>>1;                         \
}

int32_t _FP_SquareRoot(int32_t val, int32_t Q) {
  int32_t sval = 0;

  //convert Q to even
  if (Q & 0x01) {
    Q -= 1;
    val >>= 1;
  }
  //integer square root math
  for (uint8_t i=0; i<=30; i+=2)
    _SqrtStep(i);
  if (sval < val) {
    ++sval;
  }  
  //this is the square root in Q format
  sval <<= (Q)/2;
  //convert the square root to Q15 format
  if (Q < 15) {
    return(sval<<(15 - Q));
  } else {
    return(sval>>(Q - 15));
  }  
}

void setup(void) {
  volatile int32_t fix1, fix2, fix3;
  volatile float float1=0.66, float2=2.33, float3;
  
  fix1 = ftok(0.66); 
  fix2 = FloatToFixed(float2);
  
  fix3 = fix2 + fix1;
  float3 = FixedToFloat(fix3);
  
  fix3 = fix2 - fix1;
  float3 = FixedToFloat(fix3);
  
  fix3 = FP_Multiply(fix2, fix1);
  float3 = FixedToFloat(fix3);
  
  fix3 = FP_Division(fix2, fix1);
  float3 = FixedToFloat(fix3);
 
  fix1= ftok(30.25); 
  fix2 = FP_Sqrt(fix1);
  float1 = FP_FixedToFloat(fix2);    

  fix1 = FP_DegToRad(FP_FloatToFixed(45.0));
  float1 = FP_FixedToFloat(fix1);
  fix2 = FP_Sin(fix1);
  float2 = FP_FixedToFloat(fix2);

  float3 = float2 + float1;
  fix3 = FloatToFixed(float3);

  float3 = float2 - float1;
  fix3 = FloatToFixed(float3);

  float3 = float2 * float1;
  fix3 = FloatToFixed(float3);

  float3 = float2 / float1;
  fix3 = FloatToFixed(float3);
}

void loop(void) { }
Posted in Uncategorized | Tagged , , , , , , , , , , , , , , | Leave a comment

AVR GCC Fixed-Point vs. Floating Point Comparison

apples vs. oranges

This is a follow up to previous posts here and here. Using native fixed point support in GCC, on a generic ATMega328P running 16MHz on the AtmelStudio 6.2 simulator, I performed this overly simplified comparison of fixed vs. floating point math. The results posted below compare the fixed point accum type with a float.

The accum typedef allows for an unsigned 16.16 fixed format (+/-16.15 if signed). The fract typedef provides an unsigned 0.16 format (+/-0.15 if signed). Short, long and long long modifiers are also allowed.

It is very difficult to find supporting documentation on (AVR) fixed point. A basic overview of the types supported is here and here. The ISO standard for C (embedded extensions) fixed point is located here.

#include <avr/io.h>
#include <stdfix.h>

int main(void) {
  volatile accum fx1, fx2 = 2.33K, fx3 = 0.66K;
  volatile float fl1, flt2 = 2.33, fl3 = 0.66;
  
  fx1 = fx2 + fx3;
  fl1 = fl2 + fl3;

  fx1 = fx2 – fx3;
  fl1 = fl2 - fl3;

  fx1 = fx2 * fx3;
  fl1 = fl2 * fl3;

  fx3 = fx2 / fx3;
  fl3 = fl2 / fl3;

  //fl1 = (float)fx3;
}

In all cases except division, the fixed point versions are faster. All fixed point versions produce smaller code. The use of a fract type can further reduce code size and improve speed (Note: since fract division is a 16-bit version, speed is increased and size is reduced compared to floating point). Obviously, the accuracy/precision is lacking in all cases with fixed point. The results are summarized below:

Addition
Fixed-point results:
= 2.98996
27 cycles or 1.69us (4.5x faster)
code size: 238 bytes

Floating-point results:
= 2.99
123 cycles or 7.69us
Code size: 598 bytes

Subtraction
Fixed-point results:
= 1.670013
27 cycles or 1.69us (4.8x faster)
code size: 238 bytes

Floating-point results:
= 1.67
131 cycles or 8.19us
Code size: 598 bytes

Multiplication
Fixed-point results:
= 5.428833
132 cycles or 8.25us (12% faster)
code size: 394 bytes

Floating-point results:
= 5.428837
156 cycles or 9.25us
Code size; 578 bytes

Division
Fixed-point results:
= 3.530426
700 cycles or 43.75us (30% slower)
code size: 378 bytes

Floating-point results:
= 3.530303
492 cycles or 30.75us
Code size: 604 bytes

Note: To my knowledge, this will not compile with the Arduino IDE because the arduino IDE will not link to the AVR libm library which contains the fixed point routines. See my posting here for arduino compatible routines.

Posted in Uncategorized | Tagged , , , , , , , | Leave a comment

Debugging an LPC810 Breakout Board via SWD/J-Link

I’m playing with a LPC810 Breakout Board connected with the IAR EW IDE via a SEGGER J-Link debugger utilizing SWD. The J-Link is connected via a small 20 to 10-pin adapter board. Since I am unable to power the µC via the 3.3V J-Link pin (not sure why), I’m also using a USB connection with the PC to provide power to the board.

Contrary to the User Manual, I needed to enable the SWDIO and SWCLK pins on the LPC810 to get this to work. After compiling inside the IAR EW IDE, I upload the iHex format file to the µC via the FlashMagic program. When the debugger is started, it outputs a warning about the target not having enough memory. I simply ignore the warning and it all seems to work okay. Debugging is with full symbol support, and I’m able to set breakpoints once inside the debugger.

In the picture, the µC is running a simplified version of a LED blink program.

LPC810 BoB J-Link SWD Debugging

Posted in Uncategorized | Tagged , , , , , , , | Leave a comment