Arduino (AVR) Hex File Dissection (or, why is my hex file so big?)

bloat

If we compile the blinky example program for an Arduino Uno, you might notice that the Arduino IDE claims the program size is only 1030 bytes:

Sketch uses 1,030 bytes (3%) of program storage space. Maximum is 32,256 bytes.
Global variables use 9 bytes (0%) of dynamic memory, leaving 2,039 bytes for local variables. Maximum is 2,048 bytes.

However, if you examine the hexfile produced after compilation, you’ll find a file that is 2,918 bytes large:

blinky hex file size

The compiled hex file is almost 3 times larger than reported by the IDE. The hex file is basically comprised of the machine code that gets loaded into the Arduino. Does this mean the Arduino gets loaded with a much larger program than the IDE claims?

Hex Editor
Lets examine the hex file contents with a hex-editor program (like HxD) to see if we can find the reason. Here is a screen shot of the beginning of the blinky hex file:

hex editor

Records
The first thing we notice is there is some sort of a pattern repeating every 45 bytes. The file is actually divided into records, or fields that are 45-bytes long, except for the very last record, which is truncated. Here are the first 14 records of the file:


:100000000C945C000C946E000C946E000C946E00CA..
:100010000C946E000C946E000C946E000C946E00A8..
:100020000C946E000C946E000C946E000C946E0098..
:100030000C946E000C946E000C946E000C946E0088..
:100040000C9488000C946E000C946E000C946E005E..
:100050000C946E000C946E000C946E000C946E0068..
:100060000C946E000C946E00000000080002010069..
:100070000003040700000000000000000102040863..
:100080001020408001020408102001020408102002..
:10009000040404040404040402020202020203032E..
:1000A0000303030300000000250028002B000000CC..
:1000B0000000240027002A0011241FBECFEFD8E043..
:1000C000DEBFCDBF21E0A0E0B1E001C01D92A930AC..
:1000D000B207E1F70E94F1010C9401020C940000B8..

Here is the first record with spaces added between the component parts, or fields:


: 10 0000 00 0C945C000C946E000C946E000C946E00 CA ..

Each record begins with a RECORD MARK field containing 3A, which is the ASCII code for the colon (’ : ’) character.

The following 2-bytes is a RECLEN field specifying the number of bytes of information or data in the record. The maximum value of the RECLEN field is hexadecimal ’FF’ or 255. Here, the length is hexadecimal 10, which is decimal 16. Note that one data byte is represented by two ASCII characters, which therefore results in 32 bytes of data.

The next 4-bytes represent the LOAD OFFSET field which specifies a 16-bit starting offset of where to load the data bytes. Since this is the first record in the file, the load offset is 0000. Obviously, the following record has a load offset of 10 (hex).

The next field specifies the record type. This RECTYP field is used to interpret the remaining information within the record. The RECTYPE of this record is “00”, which indicates a data record. Valid record types are:
’00’ Data Record
’01’ End of File Record
’02’ Extended Segment Address Record
’03’ Start Segment Address Record
’04’ Extended Linear Address Record
’05’ Start Linear Address Record

The next 32 bytes are the actual machine code bytes of the program. This is the data that is loaded into the Arduino memory.

The last field ‘CA’ is a checksum, followed by the ASCII carriage return/line feed characters “OD OA”.

Disassembly
Here we see the data from the first record, which I divided up into 4-byte chunks, followed by a disassembly of the program (it’s the interrupt vector table):

0C945C00 0C946E00 0C946E00 0C946E00

00000000 <__vectors>:
   0:	0c 94 5c 00 	jmp	0xb8	; 0xb8 <__ctors_end>
   4:	0c 94 6e 00 	jmp	0xdc	; 0xdc <__bad_interrupt>
   8:	0c 94 6e 00 	jmp	0xdc	; 0xdc <__bad_interrupt>
   c:	0c 94 6e 00 	jmp	0xdc	; 0xdc <__bad_interrupt>

File Size Math
So, each of the program’s 1030 bytes is stored in two-byte ASCII format, which requires a total of 2060 bytes. These 2060 bytes are stored in 32-byte sections (2060 / 32 = 64.375 sections), which each have 9-bytes of header and 4-bytes of footer appended to them (65 * 13 = 845). Finally, a 13-byte “end of file record” follows. Adding all of this together yields 2918 (2060 + 845 + 13).

Other File Formats
Obviously, there are more file formats than just “ihex” used by the AVR architecture. Many of these formats are produced or used by the AVRDUDE program. A few notable ones are, raw binary (little-endian byte order, in the case of the flash ROM data), binary, Motorola S-record, and ELF. If you want to know more, Google is your friend.

Refereneces
Intel HEX Wikipedia Article
Hexadeciaml Object File Format Specification

Advertisements

About Jim Eli

µC experimenter
This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s