Floating Point Precision and Binary 32 or, Arduino Don’t Know Math

https://ucexperiment.files.wordpress.com/2016/02/levitation.jpg?w=640

Did you know?

0.1 + 0.2 = 0.30000001

Try this simple arduino program to prove it:

void setup() {
  float f;
  char s[12];

  f = 0.1 + 0.2;
  dtostrf(f, 1, 9, s); //convert float to string
  Serial.begin(9600);
  Serial.println(s); 
}

void loop() { }

First, don’t be alarmed, and second, don’t throw your arduino into the trash thinking it’s defective. Its working just fine. For comparison, performing this same math on your PC would produce a similar result. The reason for this seemingly odd behavior stems from the internal workings of a binary computer.

The arduino (and your PC) is a binary device. All calculations are reduced to on and off, or 1 and 0. Because of this, numbers are formatted in a base 2 numbering system, and all math is performed using this binary numbering system. Additionally, there are no provisions in binary for decimal fractions. To further complicate matters, we humans use a base-10 decimal numbering system. Obviously, we’ll need a process to incorporate fractions in binary and to swap between the base 2 and 10 systems. And this is where the problem lies.

The arduino utilizes a binary floating point representation for decimal numbers. The description of the data type can be found here. Officially it’s called IEEE 754 single precision floating point, and its specification can be found here. But don’t try to read that unless you’re a glutton for punishment. I’ll attempt to simply.

A floating point number is composed of 2 primary parts, the significand which contains the digits, and the exponent which determines where to place the decimal point. It’s basically scientific notation.

Our significand is 23-bits wide, which allows for 6-7 digits. The exponent is 8-bits (biased by -127, which basically allows for negative exponents) permitting numbers in the range of 10^-38 to 10^38. The left most bit is reserved for the sign of the number, and brings the total size of this value to 32-bits. The actual internal representation is called Binary32 and looks like this:

fp

To determine why our arduino doesn’t add 0.1 and 0.2 properly, we need to examine the internal representation of our floating point values. You can easily see the Binary32 representation of a floating point number by running the following simple program:

void setup(void) {
  union {
    uint32_t B32;
    float Float;
  } floatb32;

  Serial.begin(9600);
  floatb32.Float = 0.1;
  Serial.println(floatb32.B32, HEX); 
  floatb32.Float = 0.2;
  Serial.println(floatb32.B32, HEX); 
  floatb32.Float = 0.3;
  Serial.println(floatb32.B32, HEX); 
}

void loop(void) { }

Here are our floating point numbers encoded into Binary32 (hexadecimal):

0.1 = 3DCCCCCD
0.2 = 3E4CCCCD
0.3 = 3E99999A

Here are two functions which perform the conversions between floating point and the 32-bit internal representation:

uint32_t ConvertFloatToB32(float f) {
  float normalized;
  int16_t shift;
  int32_t sign, exponent, significand;

  if (f == 0.0) 
    return 0; //handle this special case
  //check sign and begin normalization
  if (f < 0) { 
    sign = 1; 
    normalized = -f; 
  } else { 
    sign = 0; 
    normalized = f; 
  }
  //get normalized form of f and track the exponent
  shift = 0;
  while (normalized >= 2.0) { 
    normalized /= 2.0; 
    shift++; 
  }
  while (normalized < 1.0) { 
    normalized *= 2.0; 
    shift--; 
  }
  normalized = normalized - 1.0;
  //calculate binary form (non-float) of significand 
  significand = normalized*(0x800000 + 0.5f);
  //get biased exponent
  exponent = shift + 0x7f; //shift + bias
  //combine and return
  return (sign<<31) | (exponent<<23) | significand;
}

float ConvertB32ToFloat(uint32_t b32) {
  float result;
  int32_t shift;
  uint16_t bias;

  if (b32 == 0) 
    return 0.0;
  //pull significand
  result = (b32&0x7fffff); //mask significand
  result /= (0x800000);    //convert back to float
  result += 1.0f;          //add one back 
  //deal with the exponent
  bias = 0x7f;
  shift = ((b32>>23)&0xff) - bias;
  while (shift > 0) { 
    result *= 2.0; 
    shift--; 
  }
  while (shift < 0) { 
    result /= 2.0; 
    shift++; 
  }
  //sign
  result *= (b32>>31)&1 ? -1.0 : 1.0;
  return result;
}

void setup(void) {
  char s[16];
  
  Serial.begin(9600);
  dtostrf(ConvertB32ToFloat(0x3E999999), 1, 9, s);
  Serial.println(s);
  dtostrf(ConvertB32ToFloat(0x3E99999A), 1, 9, s);
  Serial.println(s);
  dtostrf(ConvertB32ToFloat(0x3E99999B), 1, 9, s);
  Serial.println(s);
}

void loop(void) { }

This process of converting between a number and its internal Binary32 representation (and vice versa) includes several nuances which are beyond the scope of this post. If you are interested in the exact process, I suggest studying the above conversion functions, or reading this wiki.

However, if we use the above functions, we can easily see that 0.3 cannot be converted exactly into a Binary32 32-bit floating point number. Take a close look at the following sequential numbers:


3E999999 = 0.299999980
3E99999A = 0.300000010
3E99999B = 0.300000040

Notice there is no exact representation of 0.3. And this is why our arduino produced the odd result when asked to add 0.1 and 0.2.

Posted in Uncategorized | Tagged , , , , | Leave a comment

EC-135T2+ Emergency Procedure Guide

ALNW5

Use at your own risk. No effort has been made to ensure these documents are current or error free. Annotated guides are located here: Word format and PDF format.

Posted in Uncategorized | Leave a comment

(Sort of) Running an Arduino Program Stored in Memory

flash

Ever wonder if you could call a program stored as machine code inside of an array? It’s possible, however, there are some hurdles to overcome.

First, the arduino (an ATMEL AVR based μC) is based upon the modified Harvard architecture. Why is that important? Because in the Harvard architecture, data and program instructions are stored in different memory. These separate pathways are primarily implemented to enhance performance, but it also prohibits executing program instructions from data memory. Bummer!

Fortunately, there are provisions for storing “data” inside program memory (see this information on the use of the PROGMEM attribute). Our first task is to write a simple test program in assembly and store the machine code inside program memory. Easy!

Here is our version of a simple “blink” program :

.section .text

  sbi 0x04, 5         ;set D13(LED) as output

1:
  sbi 0x05, 5         ;turn LED on

  ldi r20, 80         ;delay ~1 second
  ldi r21, 255
  ldi r22, 255
2:
  dec r22
  brne 2b
  dec r21
  brne 2b
  dec r20
  brne 2b

  cbi 0x05, 5         ;turn LED off

  ldi r20, 80         ;delay ~1 second
  ldi r21, 255
  ldi r22, 255
3:
  dec r22
  brne 3b
  dec r21
  brne 3b
  dec r20
  brne 3b

  rjmp 1b             ;repeat

We compile (or assemble) this using the avr-as program like so:

avr-as blink.S –o blink.out
avr-objcopy –O binary blink.out blink.bin

Using a hex-editor, we open the blink.bin file and copy the machine code of our program:

hexed

This hexidecimal data is then inserted into our 44-byte MachineCode array inside the following arduino program (notice the use of the PROGMEM attribute which forces the data array to be stored inside flash memory):

const uint8_t MachineCode[44] PROGMEM = {
  0x25, 0x9A, 0x2D, 0x9A, 0x40, 0xE5, 0x5F, 0xEF,
  0x6F, 0xEF, 0x6A, 0x95, 0xF1, 0xF7, 0x5A, 0x95,
  0xE1, 0xF7, 0x4A, 0x95, 0xD1, 0xF7, 0x2D, 0x98,
  0x40, 0xE5, 0x5F, 0xEF, 0x6F, 0xEF, 0x6A, 0x95,
  0xF1, 0xF7, 0x5A, 0x95, 0xE1, 0xF7, 0x4A, 0x95,
  0xD1, 0xF7, 0xEB, 0xCF
};

const uint8_t *ptr = MachineCode;

void setup() {
  //get address of code and call it
  asm(
    "lds r30, ptr   \n\t"
    "lds r31, ptr+1 \n\t"
    "lsr r30        \n\t" //convert byte address (data) to word address (flash)
    "icall          \n\t"
  );
}

void loop() {
  //never reach here
}

That’s our entire program. It doesn’t look like it does much does it?

Finally, we define a pointer and set it to point at the MachineCode array:

const uint8_t *ptr = MachineCode;

Inside the arduino setup() function, we run a very short inline assembler routine. Here we simply convert the data (SRAM) address into a program (flash) address. This is required, because in an arduino (an ATMEL AVR based μC) the data memory is aligned by bytes (8-bits), and program memory is aligned via words (16-bits). This conversion is a simple process of dividing the address by 2 (note, we assume our MachineCode array is stored in low memory, immediately after the IVT, which makes our division easier):

  asm(
    "lds r30, ptr   \n\t"
    "lds r31, ptr+1 \n\t"
    "lsr r30        \n\t" //convert byte address (data) to word address (flash)
    "icall          \n\t"
  );

If we run this program on an arduino, you will notice it blinks the D13 LED at approximatley 1-second intervals. Next we will add provisions to transfer our array from data (SRAM) to program (flash) memory.

Posted in Uncategorized | Tagged , , , , , , , , | Leave a comment

Consequences of Global vs. Local Variable in Arduino Code

gvl

This is an overly simplistic comparison of the consequences of using a global vs. local variable on the Arduino. Beyond the obvious variable scope, take note of the program size trade off (code and data segments), startup code differences, stack impact, and potential performance consequences.

Global Variable Program:

long volatile x;

void setup() { }

void loop() {
  x = random(10);
}

Local Variable Program:

void setup() { }

void loop() {
  long volatile x;
  x = random(10);
}

Compile time report from the IDE:
Global sketch uses 960 bytes.
Global variables use 17 bytes of dynamic memory.

Local sketch uses 976 bytes.
Local variables use 13 bytes of dynamic memory.

It’s not the overall program size, but rather the comparison we are concerned with. The local version uses 16 more flash bytes, but 4 less SRAM bytes.

Disassembly of relevant portions of the programs:
Note the differences in the startup code along with the code inside loop().

Global:

. . .
do_clear_bss_start:
  7e:	a1 31       	cpi	r26, 0x11	; 17
  80:	b2 07       	cpc	r27, r18
  82:	e1 f7       	brne	.-8      	; 0x7c <.do_clear_bss_loop>
__do_copy_data:
  84:	11 e0       	ldi	r17, 0x01	; 1
  86:	a0 e0       	ldi	r26, 0x00	; 0
  88:	b1 e0       	ldi	r27, 0x01	; 1
  8a:	ec eb       	ldi	r30, 0xBC	; 188
  8c:	f3 e0       	ldi	r31, 0x03	; 3
  8e:	02 c0       	rjmp	.+4      	; 0x94 <__do_copy_data+0x10>
  90:	05 90       	lpm	r0, Z+
  92:	0d 92       	st	X+, r0
  94:	a4 30       	cpi	r26, 0x04	; 4
  96:	b1 07       	cpc	r27, r17
  98:	d9 f7       	brne	.-10     	; 0x90 <__do_copy_data+0xc>
  9a:	0e 94 88 00 	call	0x110	; 0x110 <main>
  9e:	0c 94 dc 01 	jmp	0x3b8	; 0x3b8 <_exit>
. . .
loop:
  a8:	6a e0       	ldi	r22, 0x0A	; 10
  aa:	70 e0       	ldi	r23, 0x00	; 0
  ac:	80 e0       	ldi	r24, 0x00	; 0
  ae:	90 e0       	ldi	r25, 0x00	; 0
  b0:	0e 94 63 00 	call	0xc6	; 0xc6 <_Z6randoml>
  b4:	60 93 04 01 	sts	0x0104, r22
  b8:	70 93 05 01 	sts	0x0105, r23
  bc:	80 93 06 01 	sts	0x0106, r24
  c0:	90 93 07 01 	sts	0x0107, r25
  c4:	08 95       	ret
. . .

Local:

. . .
do_clear_bss_start:
  7e:	ad 30       	cpi	r26, 0x0D	; 13
  80:	b2 07       	cpc	r27, r18
  82:	e1 f7       	brne	.-8      	; 0x7c <.do_clear_bss_loop>
__do_copy_data:
  84:	11 e0       	ldi	r17, 0x01	; 1
  86:	a0 e0       	ldi	r26, 0x00	; 0
  88:	b1 e0       	ldi	r27, 0x01	; 1
  8a:	ec ec       	ldi	r30, 0xCC	; 204
  8c:	f3 e0       	ldi	r31, 0x03	; 3
  8e:	02 c0       	rjmp	.+4      	; 0x94 <__do_copy_data+0x10>
  90:	05 90       	lpm	r0, Z+
  92:	0d 92       	st	X+, r0
  94:	a4 30       	cpi	r26, 0x04	; 4
  96:	b1 07       	cpc	r27, r17
  98:	d9 f7       	brne	.-10     	; 0x90 <__do_copy_data+0xc>
  9a:	0e 94 90 00 	call	0x120	; 0x120 <main>
  9e:	0c 94 e4 01 	jmp	0x3c8	; 0x3c8 <_exit>
. . .
loop:
  a8:	cf 93       	push	r28
  aa:	df 93       	push	r29
  ac:	00 d0       	rcall	.+0      	; 0xae <loop+0x6>
  ae:	00 d0       	rcall	.+0      	; 0xb0 <loop+0x8>
  b0:	cd b7       	in	r28, 0x3d	; 61
  b2:	de b7       	in	r29, 0x3e	; 62
  b4:	6a e0       	ldi	r22, 0x0A	; 10
  b6:	70 e0       	ldi	r23, 0x00	; 0
  b8:	80 e0       	ldi	r24, 0x00	; 0
  ba:	90 e0       	ldi	r25, 0x00	; 0
  bc:	0e 94 6b 00 	call	0xd6	; 0xd6 <_Z6randoml>
  c0:	69 83       	std	Y+1, r22	; 0x01
  c2:	7a 83       	std	Y+2, r23	; 0x02
  c4:	8b 83       	std	Y+3, r24	; 0x03
  c6:	9c 83       	std	Y+4, r25	; 0x04
  c8:	0f 90       	pop	r0
  ca:	0f 90       	pop	r0
  cc:	0f 90       	pop	r0
  ce:	0f 90       	pop	r0
  d0:	df 91       	pop	r29
  d2:	cf 91       	pop	r28
  d4:	08 95       	ret
. . .
Posted in Uncategorized | Tagged , | Leave a comment

Arduino Due Assembly Language Listing of Compiled Sketch (Windows)

disassembled

1) Compile your sketch with verbose output turned on during compilation.

2) Find the ELF file – One of the last commands in the output window will be an “avr-objcopy” targetting the compiled (.elf) file of your sketch. Find and copy this entire command line in your output window. Example:

"C:\Users\James\AppData\Local\Arduino15\packages\arduino\tools\arm-none-eabi-gcc\4.8.3-2014q1/bin/arm-none-eabi-objcopy" -O binary "C:\Users\James\AppData\Local\Temp\buildd61ea60940506cefcd14e1177e0613a0.tmp/Due_Blink_Inline.ino.elf" "C:\Users\James\AppData\Local\Temp\buildd61ea60940506cefcd14e1177e0613a0.tmp/Due_Blink_Inline.ino.bin"

due elf

3) Open a command prompt and paste the copied text.

4) Replace “avr-objcopy” with “avr-objdump”, and delete everything in quotes after the .elf file.

5) Replace the obj-dump options, “-O binary” with “–D –S”.

6) Add an output file. Example:

C:\Users\James>"C:\Users\James\AppData\Local\Arduino15\packages\arduino\tools\arm-none-eabi-gcc\4.8.3-2014q1/bin/arm-none-eabi-objdump" -D –S "Due_Blink_Inline.ino.elf" >dump.txt

Changes made to the copied command line are highlighted in red:
due cmd

The disassembly of the simple “blink” example program produces a text file 1,967,968 bytes large! Here is the loop function:

00080150 <loop>:
80150:  b508        push  {r3, lr}
80152:  2101        movs  r1, #1
80154:  200d        movs  r0, #13
80156:  f000 fbbd   bl  808d4 <digitalWrite>
8015a:  f44f 707a   mov.w  r0, #1000
8015e:  f000 fb6f   bl  80840 <delay>
80162:  200d        movs  r0, #13
80164:  2100        movs  r1, #0
80166:  f000 fbb5   bl  808d4 <digitalWrite>
8016a:  e8bd 4008   ldmia.w  sp!, {r3, lr}
8016e:  f44f 707a   mov.w  r0, #1000
80172:  f000 bb65   b.w  80840 <delay>
80176:  4770        bx  lr

Adapted from the technique posted here.

Posted in Uncategorized | Tagged , , , , , | Leave a comment

Arduino Due Inline Assembler Blink

blink

Very basic inline assembler example of the blinky program. A good place to start learning ARM assembly language is through this online book. You will find a concise summary of ARM GCC inline assembly here.

void setup() {
  asm volatile(
    "mov r0, %[led] \n\t" 
    "mov r1, #1     \n\t"
    "lsl r1, #27    \n\t"
    "str r1, [r0]   \n\t" : : [led] "r" (&REG_PIOB_OER)
  );
}

void loop() {
    asm volatile(
      "push {r0-r3, lr}   \n\t" 
      "mov r0, %[led_set] \n\t" 
      "mov r1, #1         \n\t" 
      "lsl r1, #27        \n\t" 
      "str r1, [r0]       \n\t"
      "mov r0, #1000      \n\t"
      "bl delay           \n\t"
      "mov r0, %[led_clr] \n\t"
      "mov r1, #1         \n\t"
      "lsl r1, #27        \n\t"
      "str r1, [r0]       \n\t"
      "mov r0, #1000      \n\t"
      "bl delay           \n\t"
      "pop {r0-r3, lr}    \n\t"
      "bx lr              \n\t"
      : : [led_set] "r" (&REG_PIOB_SODR), [led_clr] "r" (&REG_PIOB_CODR)
    );
}

Interesting to note, the Arduino Due example blink program produces a file 10,092 bytes in size. My inline assembler program is 10,028 bytes, saving a mere 64 bytes (less than 1% smaller). However, the following C version listed below, compiled inside AtmelStudio6 produces a file 4,160 bytes large.

Atmel Studio Blink code:

#include "sam.h"

volatile uint32_t ms;

void Systick_Handler(void) {
  ms++;
}

uint32_t GetTickCount(void) {
  return ms;
}

static void ConfigIO(void) {
  //enable io
  PIOB->PIO_PER = PIO_PB27;
  PIOD->PIO_PER = PIO_PD0;
  //set to output
  PIOB->PIO_OER = PIO_PB27;
  PIOD->PIO_OER = PIO_PD0;
  //disable pull-up
  PIOB->PIO_PUDR = PIO_PB27;
  PIOD->PIO_PUDR = PIO_PD0;
}

void Delay(uint32_t ms) {
  uint32_t start;

  if (ms == 0)
    return;
  start = GetTickCount();
  do {
  } while (GetTickCount() - start < ms);
}

int main(void) {
  //initialize system 
  SystemInit();
  ConfigIO();
  while (1) {
    REG_PIOB_SODR = (1u<<27);
    Delay(1000);
    REG_PIOB_CODR = (1u<<27);
    Delay(1000);
  }
}
Posted in Uncategorized | Tagged , , , , | Leave a comment

STM32F411 RC Calibration Using a DS3231 TCXO 1HZ Signal

frequency calibration

STM32 RC oscillator frequency varies from one chip to another due to manufacturing process variations. ST claim each device is factory calibrated for 1% accuracy at 25°C. Thus, after reset this factory calibration value is loaded in the HSICAL[7:0] bits of the RCC clock control register (RCC->CR). Yet, here is what my STM32F411 Nucleo board’s RC clocked out at (a 1Hz timebase generated a period of 0.988 seconds, which is the loss of roughly 17 minutes over a 24 hour period):

Saleae Logic Screen Capture

Of course, further degradation in the supposed 1% accuracy results from voltage and temperature variations. Therefore, ST has graciously provided a method to trim the internal RC through a 4-bit calibration value. This can be accomplished programmatically by using the HSITRIM[4:0] bits of the RCC clock control register (RCC->CR).

The following program is designed to find the best HSITRIM calibration value given a 1Hz reference signal. I used the 1Hz signal generated by a Maxim DS3231 TXCO RTC (in the form of a ChronoDot) for my testing here.

Calibration Procedure
The calibration procedure consists of first measuring the HSI frequency, computing the error by reference to the DS3231 1HZ signal, and finally setting the HSITRIM bits in the RCC_CR register.

We do not measure the HSI frequency directly, but estimate it by counting the clock pulses using a timer. To perform this action, obviously a very accurate reference frequency must be available such as the signal provided by an external DS3231 RTC (see the ST App Note referenced at the end of this post for methods using either the 50/60Hz signal inherent in the mains power, or at lower accuracy, the internal 32kHz RTC crystal).

The following shows how the reference signal period is measured via a timer:
capture description

On each rising edge, two interrupts occur, a capture compare and an update interrupt. The latter is used to count the counter overflows over a reference period. In must be noted, since both interrupts occur at the same time at the beginning of a new period, an extra overflow occurs. This is the reason we subtract 1 from the overflow counter. The number of counted clock pulses is given as follows:
• N is the number of timer overflows during one period of the reference frequency
• Capture1 is the value read from the timer CCR1 register.

Since the timer is clocked by the internal RC, it is easy to compute the real frequency generated by the HSI and compare it to the reference frequency. The error (in Hz) is computed as the absolute value of the difference between the RC frequency and 8,000,000Hz (8MHz):

Error (Hz) = | RC_Frequency – 8000000 |

While iterating through all possible values for the HSITRIM bits of the RCC_CR register, the algorithm calculates the error, and determines the best calibration value. Timer #1 is used for measuring the period. After the calibration is complete, timer #2 is setup with a 1Hz timebase and the GPIOA.5 pin (Nucleo green LED) can be used to check the results. The I2C functions used by this program to command the DS3231 chip to output a 1Hz square wave are outlined in this prior post.

RCC CR Register and HSITRIM Bits
The RCC->CR register with HSITRIM bits is shown below:
cap 2

The description of the HSITRIM bits:
cap 3
Accuracy of Frequency Measurements
The accuracy of frequency measurement is dependent upon on the accuracy and stability of the reference frequency. Since the measurement also depends on the finite resolution of the timer, a reference frequency not exceeding 3000Hz is recommended by ST. The following table gives an idea of the efficiency of calibration versus reference frequency accuracy:
cap 4

Utilizing this calibration program enabled me to trim the STM32F411 Nucleo frequency to 8,004,314 with an error of 4,314 cycles, improving the RC accuracy to within 0.053925% (a 46 second error over a 24-hour period, a 22x improvement over my non-calibrated Nucleo board).

Calibration Program

//input capture & i2c test
//1Hz SQW from Chronodot
#include "stdint.h"
#include "stm32f4xx_rcc.h"
#include "stm32f4xx_gpio.h"
#include "stm32f4xx_it.h"
#include "stm32f4xx_tim.h"
#include "stm32f4xx_i2c.h"
#include "string.h"
#include "stdio.h"
#include "stdlib.h"

//ChronoDot
//SQW  -----> Pa9 (tied thru 10kR to VCC) 
//GND  -----> GND 
//VCC  -----> 5V
//I2C1 GPIO Configuration    
//PB6 ------> I2C1_SCL
//PB7 ------> I2C1_SDA 
//Virtual Comm Port
//PA2 ------> TX 
//PA3 ------> RX

#define DS3231_ADDRESS          0xd0 //I2C 7-bit slave address shifted for 1 bit to the left
#define DS3231_SECONDS          0x00
#define DS3231_CONTROL_REG      0x0e

//initialize vcomm UART pins, baudrate 
void ConfigureUSART2(void) {
  RCC->AHB1ENR |= (1ul<<0);                         //enable GPIOA clock               
  RCC->APB1ENR |= (1ul<<17);                        //enable USART#2 clock             
  //configure PA3 to USART2_RX, PA2 to USART2_TX 
  GPIOA->AFR[0] &= ~((15ul<<4*3) | (15ul<<4*2));
  GPIOA->AFR[0] |= ((7ul<<4*3) | (7ul<<4*2));
  GPIOA->MODER &= ~((3ul<<2*3) | (3ul<<2*2));
  GPIOA->MODER |= ((2ul<<2*3) | (2ul<<2*2));
  //115200 baud @8MHz APB1 peripheral clock (PCLK1)
  USART2->BRR = (8000000ul/115200ul);
  USART2->CR3 = 0x0000;                             //no flow control                 
  USART2->CR2 = 0x0000;                             //1 stop bit                      
  //enable RX, enable TX, 1 start bit, 8 data bits, enable USART                    
  USART2->CR1 = ((1ul<<2) | (1ul<<3) | (0ul<<12) | (1ul<<13));
}

//Write character to Serial Port
int USART2PutChar (int ch) {
  while (!(USART2->SR & 0x0080));
  USART2->DR = (ch & 0xFF);
  return (ch);
}

//Read character from Serial Port
int USART2GetChar(void) {
  if (USART2->SR & 0x0020)
    return (USART2->DR);
  return (-1);
}

void USART2OutString(char *s) {
  while (*s)
    USART2PutChar(*s++);
}  

//printf redirect
//#include "stdio.h"
//implement own __FILE struct 
struct __FILE {
  int dummy;
};
//struct FILE is implemented in stdio.h
FILE __stdout;
// 
int fputc(int ch, FILE *f) {
  //while (USART_GetFlagStatus(USART2, USART_FLAG_TXE) == RESET);
  //send byte to USART2 
  USART2PutChar(ch);
  //if all ok, must return char written 
  return ch;
  //if char not correct, can return EOF (-1) to stop 
  //return -1;
}

//Configures the different system clocks 
void ConfigureSystem(void) { 
  //enable HSI
  RCC->CR |= ((uint32_t)RCC_CR_HSION);                     
  while ((RCC->CR & RCC_CR_HSIRDY) == 0)
    ; //Wait for HSI Ready RCC->CFGR = RCC_CFGR_SW_HSI;
  while ((RCC->CFGR & RCC_CFGR_SWS) != RCC_CFGR_SWS_HSI)
    ; //wait for HSI used as system clock
  FLASH->ACR  = FLASH_ACR_PRFTEN;      //enable Prefetch Buffer
  FLASH->ACR |= FLASH_ACR_ICEN;        //instruction cache enable
  FLASH->ACR |= FLASH_ACR_DCEN;        //data cache enable
  FLASH->ACR |= FLASH_ACR_LATENCY_0WS; //flash 0 wait state
  //HCLK = SYSCLK
  RCC->CFGR |= RCC_CFGR_HPRE_DIV4;                         
  //APB1 = HCLK/2
  RCC->CFGR |= RCC_CFGR_PPRE1_DIV1;                        
  //APB2 = HCLK/1
  RCC->CFGR |= RCC_CFGR_PPRE2_DIV1;                        
  //disable PLL
  RCC->CR &= ~RCC_CR_PLLON;                                
  //PLL configuration: VCO=HSI/M*N, Sysclk=VCO/P
  //PLL_M=16, PLL_N=192, PLL_P=6, PLL_SRC=HSI, PLL_Q=4 for 32MHz SYSCLK/8MHz HCLK/8MHz APB1 & APB2 (PCLK1&2)
  RCC->PLLCFGR = (16ul | (192ul<<6) | (2ul<<16) | (RCC_PLLCFGR_PLLSRC_HSI) | (4ul<<24));
  //enable PLL
  RCC->CR |= RCC_CR_PLLON;                                 
  //Wait till PLL is ready
  while((RCC->CR & RCC_CR_PLLRDY) == 0) 
    __NOP();
  //select PLL as system clock source
  RCC->CFGR &= ~RCC_CFGR_SW;                               
  RCC->CFGR |=  RCC_CFGR_SW_PLL;
  while ((RCC->CFGR & RCC_CFGR_SWS) != RCC_CFGR_SWS_PLL)
    ; //wait till PLL is system clock src
}

//configure nucleo led (pa5) pin as output, push-pull, no pull-up/down 
void ConfigureNucleoGPIO(void) {
  RCC->AHB1ENR |= (1ul<<0);                      //enable GPIOA peripheral clock
  GPIOA->MODER &= ~((3ul<<2*5));                //clear both mode bits
  GPIOA->MODER |= ((GPIO_Mode_OUT<<2*5));       //set as general purpose output
  GPIOA->OTYPER &= ~((1ul<<5));                 //clear (push/pull)
  GPIOA->OSPEEDR &= ~((3ul<<2*5));              //clear both speed bits
  GPIOA->OSPEEDR |= ((GPIO_Medium_Speed<<2*5)); //set medium speed
  GPIOA->PUPDR &= ~((3ul<<2*5));                //clear both pull up/down status (none)
}

#define I2C_EVENT_MASTER_TRANSMITTER_MODE_SELECTED        ((uint32_t)0x00070082)  /* BUSY, MSL, ADDR, TXE and TRA flags */
#define I2C_EVENT_MASTER_RECEIVER_MODE_SELECTED           ((uint32_t)0x00030002)  /* BUSY, MSL and ADDR flags */
#define I2C_EVENT_MASTER_BYTE_RECEIVED                    ((uint32_t)0x00030040)  /* BUSY, MSL and RXNE flags */
#define I2C_TRANSMITTER_MODE   0
#define I2C_RECEIVER_MODE      1
#define I2C_ACK_ENABLE         1
#define I2C_ACK_DISABLE        0
#define FLAG_MASK ((uint32_t)0x00FFFFFF) //I2C FLAG mask

uint32_t I2CTimeout;

ErrorStatus I2CChkEvent(uint32_t I2C_EVENT) {
  uint32_t lastevent = 0;
  ErrorStatus status = ERROR;

  //get the last event value from I2C status register 
  lastevent = (I2C1->SR1 | (I2C1->SR2<<16))&(uint32_t)0x00FFFFFF;
  //check whether the last event contains the I2C_EVENT 
  if ((lastevent&I2C_EVENT) == I2C_EVENT)
    status = SUCCESS; //last event is equal to I2C_EVENT 
  else
    status = ERROR;   //last event is different from I2C_EVENT 
  //return status 
  return status;
}

int16_t I2CStart(uint8_t address, uint8_t direction, uint8_t ack) {
  //generate I2C start pulse 
  I2C1->CR1 |= I2C_CR1_START;
  //wait till I2C is busy 
  I2CTimeout = 20000;
  while (!(I2C1->SR1&I2C_SR1_SB)) {
    if (--I2CTimeout == 0x00) 
      return 1;
  }
  //enable ack if we select it 
  if (ack) 
    I2C1->CR1 |= I2C_CR1_ACK;
  //send write/read bit 
  if (direction == I2C_TRANSMITTER_MODE) {
    //send address with zero last bit 
    I2C1->DR = address & ~I2C_OAR1_ADD0;
    //wait till finished 
    I2CTimeout = 20000;
    while (!(I2C1->SR1&I2C_SR1_ADDR)) {
      if (--I2CTimeout == 0x00) 
        return 1;
    }
  }
  if (direction == I2C_RECEIVER_MODE) {
    //send address with 1 last bit 
    I2C1->DR = address | I2C_OAR1_ADD0;
    //wait till finished 
    I2CTimeout = 20000;
    while (!I2CChkEvent(I2C_EVENT_MASTER_RECEIVER_MODE_SELECTED)) {
      if (--I2CTimeout == 0x00) 
        return 1;
    }
  }
  //read status register to clear ADDR flag 
  I2C1->SR2;
  //return 0, everything ok 
  return 0;
}

uint8_t I2CStop(void) {
  //wait till transmitter not empty 
  I2CTimeout = 20000;
  while (((!(I2C1->SR1&I2C_SR1_TXE)) || (!(I2C1->SR1&I2C_SR1_BTF)))) {
    if (--I2CTimeout == 0x00) 
      return 1;
  }
  //generate stop 
  I2C1->CR1 |= I2C_CR1_STOP;
  //return 0, everything ok 
  return 0;
}

void I2CWriteData(uint8_t data) {
  //wait till I2C is not busy anymore 
  I2CTimeout = 20000;
  while (!(I2C1->SR1 & I2C_SR1_TXE) && I2CTimeout) 
    I2CTimeout--;
  //send I2C data 
  I2C1->DR = data;
}

void I2CWrite(uint8_t address, uint8_t reg, uint8_t data) {
  I2CStart(address, I2C_TRANSMITTER_MODE, I2C_ACK_DISABLE);
  I2CWriteData(reg);
  I2CWriteData(data);
  I2CStop();
}

static __I uint8_t PrescTable[16] = { 0, 0, 0, 0, 1, 2, 3, 4, 1, 2, 3, 4, 6, 7, 8, 9 };

uint32_t GetPclk1Freq(void) {
  uint32_t tmp = 0, presc = 0, pllvco = 0, pllp = 2, pllsource = 0, pllm = 2;
  uint32_t sysclk, hclk;

  //PLL_VCO = (HSE_VALUE or HSI_VALUE /PLLM)*PLLN //SYSCLK = PLL_VCO/PLLP  
  pllsource = (RCC->PLLCFGR&RCC_PLLCFGR_PLLSRC)>>22;
  pllm = RCC->PLLCFGR&RCC_PLLCFGR_PLLM; 
  if (pllsource != 0)
    //HSE used as PLL clock source 
    pllvco = (16000000/pllm)*((RCC->PLLCFGR & RCC_PLLCFGR_PLLN)>>6);
  else
    //HSI used as PLL clock source 
    pllvco = (16000000/pllm)*((RCC->PLLCFGR&RCC_PLLCFGR_PLLN)>>6); 
  pllp = (((RCC->PLLCFGR&RCC_PLLCFGR_PLLP)>>16) + 1 )*2; 
  sysclk = pllvco/pllp;
  //get HCLK prescaler 
  tmp = RCC->CFGR&RCC_CFGR_HPRE;
  tmp = tmp>>4;
  presc = PrescTable[tmp];
  //HCLK clock frequency 
  hclk = sysclk>>presc;
  //get PCLK1 prescaler 
  tmp = RCC->CFGR&RCC_CFGR_PPRE1;
  tmp = tmp>>10;
  presc = PrescTable[tmp];
  //PCLK1 clock frequency 
  return (hclk>>presc);
}

//I2C_duty_cycle_in_fast_mode I2C duty cycle in fast mode  
#define I2C_DUTYCYCLE_2         ((uint32_t)0x00000000)
#define I2C_DUTYCYCLE_16_9      I2C_CCR_DUTY

uint32_t I2CSpeed(uint32_t pclk, uint32_t speed, uint32_t duty_cycle) {
  uint32_t i2c_speed, fast;

  if (duty_cycle == I2C_DUTYCYCLE_2) 
    fast = (pclk/(speed*3));
  else
    fast = ((pclk/(speed*25)) | I2C_DUTYCYCLE_16_9);
  if (speed <= 100000) {
    if (((pclk/(speed<<1))&I2C_CCR_CCR) < 4)
      i2c_speed = 4;
    else
      i2c_speed = (pclk/(speed<<1));
  } else if ((fast&I2C_CCR_CCR) == 0) 
    i2c_speed = 1;
  else
    i2c_speed = (fast | I2C_CCR_FS);
  return i2c_speed;
}

//setup i2c gpiob pins 6&7
void I2CGPIOInit(void) {
  uint32_t position;

  //enable gpiob peripheral clock
  RCC->AHB1ENR |= (1UL<<1);
  for (position=6; position<=7; position++) {
    //configure Alternate function mapped with the current IO 
    GPIOB->AFR[position>>3] &= ~((uint32_t)0xF<<((uint32_t)(position&(uint32_t)0x07)*4));
    GPIOB->AFR[position>>3] |= ((uint32_t)(GPIO_AF_I2C1)<<(((uint32_t)position&(uint32_t)0x07)*4));
    //configure IO Direction mode (Input, Output, Alternate or Analog) 
    GPIOB->MODER &= ~(((uint32_t)0x00000003)<<(position*2));
    GPIOB->MODER |= (GPIO_Mode_AF<<(position*2));
    //configure the IO Speed 
    GPIOB->OSPEEDR &= ~(((uint32_t)0x00000003)<<(position*2));
    GPIOB->OSPEEDR |= (GPIO_High_Speed<<(position*2));
    //configure the IO Output Type 
    GPIOB->OTYPER &= ~(((uint32_t)0x00000001)<<position) ;
    GPIOB->OTYPER |= (GPIO_OType_OD<<position);
    //activate the Pull-up or Pull down resistor for the current IO 
    GPIOB->PUPDR &= ~(((uint32_t)0x00000003)<<(position*2));
    GPIOB->PUPDR |= (GPIO_PuPd_UP<<(position*2));
  }
}

void ConfigureI2C(void) {
  uint32_t freqrange = 0;
  uint32_t pclk1 = 0;

  //setup i2c gpio pins
  I2CGPIOInit();
  //enable i2c1 clock
  RCC->APB1ENR |= (1UL<<21);
  //disable the selected I2C peripheral 
  I2C1->CR1 &= ~(uint32_t)0x01;
  //get PCLK1 frequency 
  pclk1 = GetPclk1Freq();
  //calculate frequency range 
  freqrange = (pclk1/1000000);
  //configure I2C1 frequency range 
  I2C1->CR2 = freqrange;
  //configure I2C1 rise time:  if(I2CClockSpeed <= 100000) 
  if (100000 <= 100000) 
    I2C1->TRISE = freqrange + 1; 
  else
    I2C1->TRISE = ((freqrange*300)/1000) + 1;
  //configure I2C1 speed 
  I2C1->CCR = I2CSpeed(pclk1, 100000, I2C_DUTYCYCLE_2);
  //configure I2C1 Generalcall and NoStretch mode 
  I2C1->CR1 = (uint32_t)0x00;
  //configure I2C1 Own Address1 and addressing mode 
  I2C1->OAR1 = (uint32_t)0x00004000;
  //configure I2C1 Dual mode and Own Address2 
  I2C1->OAR2 = (uint32_t)0x00;
  //enable the I2C peripheral 
  I2C1->CR1 |= (uint32_t)0x01;
}

//calibration values
volatile uint16_t Capture = 1;     //activate capture & update irq when==1
volatile uint8_t CapCount = 0;     //number of samples counter
volatile uint16_t OvrflwCount = 0; //tim1 counter overflow counter
volatile uint32_t Freq = 0;        //capture frequency
volatile uint32_t CumulFreq = 0;   //cumulative capture frequency
uint32_t FreqErr = 1000000;        //set initially very large
uint16_t CalVal;                   //hsitrim value
uint32_t CalFreq;                  //resultant frequency after calibration
#define SAMPLES 3                  //number of samples/captures

void TIM1_UP_TIM10_IRQHandler(void) { 
  if((TIM1->SR&TIM_IT_Update) && (Capture != 0)) {
    TIM1->SR = (uint16_t)~0x01;
    //number of overflows
    OvrflwCount++; 
  }
}

void TIM1_CC_IRQHandler(void) { 
  uint16_t CapVal = 0;
  
  if ((TIM1->SR&TIM_IT_CC2) && (Capture != 0)) {  
    TIM1->SR = (uint16_t)~0x04;
    CapVal = TIM1->CCR2;
    //note: (overflow - 1) eliminates simultaneous occurance of update & ic irq  
    Freq = CapVal + ( ((OvrflwCount - 1)*0xffff) );
    if (CapCount > 0) {
      //cumulative capture frequencies
      CumulFreq = CumulFreq + Freq;
      //average frequency of all captures
      Freq = CumulFreq/CapCount;
      if (CapCount == SAMPLES) 
        //terminate capturing
        Capture = 0;
    }
  }
  //reset for new capture
  OvrflwCount = 0;
  //increment sample count
  CapCount++;
}

//Configure TIM1
static void ConfigureTIM1(void) {
  uint8_t tmppriority = 0x00;
  uint8_t tmppre = 0x00;
  uint8_t tmpsub = 0x0F;

  //enable timer1 peripheral clock
  RCC->APB2ENR |= (1ul<<0);
  //enable gpioa peripheral clock
  RCC->AHB1ENR |= (1ul<<0);                      
  //TIM1 channel 2 pin (PA9) configuration 
  //configure alternate function 
  GPIOA->AFR[9>>3] &= ~((uint32_t)0xF<<((uint32_t)(9&(uint32_t)0x07)*4));
  GPIOA->AFR[9>>3] |= ((uint32_t)(GPIO_AF_TIM1)<<(((uint32_t)9&(uint32_t)0x07)*4));
  //configure IO Direction mode (Input, Output, Alternate or Analog) 
  GPIOA->MODER &= ~(((uint32_t)0x00000003)<<(9*2));
  GPIOA->MODER |= (GPIO_Mode_AF<<(9*2));
  //configure the IO Speed 
  GPIOA->OSPEEDR &= ~(((uint32_t)0x00000003)<<(9*2));
  GPIOA->OSPEEDR |= (GPIO_High_Speed<<(9*2));
  //configure the IO Output Type 
  GPIOA->OTYPER &= ~(((uint32_t)0x00000001)<<9) ;
  GPIOA->OTYPER |= (GPIO_OType_OD<<9);
  //activate the Pull-up or Pull down resistor for the current IO 
  GPIOA->PUPDR &= ~(((uint32_t)0x00000003)<<(9*2));
  GPIOA->PUPDR |= (GPIO_PuPd_UP<<(9*2));

  //enable the TIM1_CC_IRQn interrupt & set priority
  tmppriority = (0x700 - ((SCB->AIRCR)&(uint32_t)0x700))>>0x08;
  tmppre = (0x4 - tmppriority);
  tmpsub = tmpsub>>tmppriority;
  tmppriority = 0<<tmppre;
  tmppriority |= (uint8_t)(1&tmpsub);
  tmppriority = tmppriority<<0x04;
  NVIC->IP[TIM1_CC_IRQn] = tmppriority;
  NVIC->ISER[TIM1_CC_IRQn>>0x05] = (uint32_t)0x01<<(TIM1_CC_IRQn&(uint8_t)0x1F);
  //enable TIM1__UP_TIM10_IRQn interrupt & set priority
  tmppriority = (0x700 - ((SCB->AIRCR)&(uint32_t)0x700))>>0x08;
  tmppriority = 0<<tmppre;
  tmppriority |= (uint8_t)(2&tmpsub);
  tmppriority = tmppriority<<0x04;
  NVIC->IP[TIM1_UP_TIM10_IRQn] = tmppriority;
  NVIC->ISER[TIM1_UP_TIM10_IRQn>>0x05] = (uint32_t)0x01<<(TIM1_UP_TIM10_IRQn&(uint8_t)0x1F);

  //TIM1 configured input capture mode 
  //with external signal is connected to TIM1 CH2 pin (pa9)  
  //rising edge is used 
  TIM1->CCER = (uint16_t)0;      //disable cc2 so we can set CCMR1 
  TIM1->CCMR1 = (uint16_t)0x100; //CC2S:01 (CC2 channel input, IC2 mapped to TI2)
  TIM1->CCMR2 = (uint16_t)0;
  TIM1->CCER = (uint16_t)0x10;   //CC2E set
  TIM1->SMCR = (uint16_t)0x64;   //TS:110 (filtered input timer 2) & SMS:100 (reset mode)
  TIM1->EGR = (uint16_t)0;
  TIM1->PSC = (uint16_t)0;
  TIM1->SR = (uint16_t)0;        //reset status reg
  TIM1->CR2 = (uint16_t)0;
  //TIM1 CR & DIER are not set here
}

//calibrate internal RC and return frequency 
void Calibrate(void) {
  uint8_t Trim;

  //iterate throught all possible hsitrim values
  for (Trim=0; Trim<32; Trim++) {
    uint32_t Error = 0;

    //toggle led
    GPIOA->ODR ^= GPIO_Pin_5;
    //set new hsitrim    
    RCC->CR &= ~RCC_CR_HSITRIM;
    RCC->CR |= (uint32_t)Trim<<3;
    //reset counters
    Freq = 0;
    CumulFreq = 0;
    CapCount = 0;
    //initate new round of captures
    Capture = 1;
    //wait until all periods of current frequency get measured 
    while(Capture == 1);
    //compute frequency error corresponding to current hsitrim 
    if (Freq >= 8000000) 
      Error = Freq - 8000000;
    else
      Error = 8000000 - Freq;
    //retain best calibration value nearest to 8000000Hz 
    if (FreqErr > Error) {
      FreqErr = Error;
      CalVal = Trim;
      CalFreq = Freq;
    }
  }
  //set best histrim 
  RCC->CR &= ~RCC_CR_HSITRIM;
  RCC->CR |= (uint32_t)CalVal<<3;
}

//timer #2 global interrpt handler
void TIM2_IRQHandler(void) {
  if ((TIM2->SR&TIM_IT_Update) && (TIM2->DIER&TIM_IT_Update)) {
    //clear it pending bit
    TIM2->SR = (uint16_t)~TIM_IT_Update;
    //toggle led
    GPIOA->ODR ^= GPIO_Pin_5;
  }
}

//setup timer #2 interrupt
void ConfigureTIM2(void) {
  uint8_t tmppriority = 0x00;
  uint8_t tmpsub = 0x0F;
  
  //enable timer2 peripheral clock
  RCC->APB1ENR |= (1ul<<0);
  //timer #2 nvic priority
  tmppriority = (0x700 - ((SCB->AIRCR)&(uint32_t)0x700))>>0x08;
  tmpsub = tmpsub>>tmppriority;
  tmppriority = 0;
  tmppriority |= (uint8_t)(1&tmpsub);
  tmppriority = tmppriority<<0x04;
  NVIC->IP[TIM2_IRQn] = tmppriority;
  //enable IRQ channel
  NVIC->ISER[TIM2_IRQn>>0x05] = (uint32_t)0x01<<(TIM2_IRQn&(uint8_t)0x1F);
  TIM2->CR1 = (uint32_t)0;      //all off
  //for 1 second @ 8MHz = 1 * 8,000,000, or 8,000 * 1,000
  //prescale = (8,000 - 1), period = (1,000 - 1)
  TIM2->PSC = 7999;             //set prescaler 
  TIM2->ARR = 999;              //set autoreload  
  TIM2->EGR = (uint16_t)0x01;   //(IM_PSCReloadMode_Immediate) generate update event to reload prescaler immediately 
  TIM2->CR1 = (uint32_t)0x01;   //(TIM_CR1_CEN) enable the counter 
  TIM2->DIER |= (uint16_t)0x01; //(TIM_IT_Update) enable interrupt
}

int main(void) {
  char s[64];
  
  //configure HSI as system clock
  ConfigureSystem();                              
  SystemCoreClockUpdate();
  //configure nucleo led
  ConfigureNucleoGPIO();
  //configure tim1
  ConfigureTIM1();
  //configure i2c
  ConfigureI2C();
  //init vcomm port
  ConfigureUSART2();
  //set DS3231 SQW to 1Hz
  I2CWrite(DS3231_ADDRESS, DS3231_CONTROL_REG, 0x00);

  USART2OutString("Starting Calibration: @1Hz this may take awhile...\n");
  TIM1->CR1 |= (uint16_t)0x05;    //URS & counter enabled
  TIM1->DIER |= (uint16_t)0x45;   //update & CC2 enabled
  Calibrate();
  printf("Freq: %ul Error: %ul HSITRIM: %u Accuracy: %0.4f\r\n", CalFreq, FreqErr, CalVal, 8000000./(double)Freq);
  USART2OutString(s);
  TIM1->CR1 &= (uint16_t)~0x05;    //URS & counter disabled
  TIM1->DIER &= (uint16_t)~0x45;   //update & CC2 disabled

  //flash led @1Hz to signify complete
  ConfigureTIM2();
  while (1);
}

The technique used by this program is an adaptation of that found in the ST App Note AN4067.

Posted in Uncategorized | Tagged , , , , , , , , | Leave a comment