If we compile the blinky example program for an Arduino Uno, you might notice that the Arduino IDE claims the program size is only 1030 bytes:
Sketch uses 1,030 bytes (3%) of program storage space. Maximum is 32,256 bytes.
Global variables use 9 bytes (0%) of dynamic memory, leaving 2,039 bytes for local variables. Maximum is 2,048 bytes.
However, if you examine the hexfile produced after compilation, you’ll find a file that is 2,918 bytes large:
The compiled hex file is almost 3 times larger than reported by the IDE. The hex file is basically comprised of the machine code that gets loaded into the Arduino. Does this mean the Arduino gets loaded with a much larger program than the IDE claims?
Lets examine the hex file contents with a hex-editor program (like HxD) to see if we can find the reason. Here is a screen shot of the beginning of the blinky hex file:
The first thing we notice is there is some sort of a pattern repeating every 45 bytes. The file is actually divided into records, or fields that are 45-bytes long, except for the very last record, which is truncated. Here are the first 14 records of the file:
Here is the first record with spaces added between the component parts, or fields:
: 10 0000 00 0C945C000C946E000C946E000C946E00 CA ..
Each record begins with a RECORD MARK field containing 3A, which is the ASCII code for the colon (’ : ’) character.
The following 2-bytes is a RECLEN field specifying the number of bytes of information or data in the record. The maximum value of the RECLEN field is hexadecimal ’FF’ or 255. Here, the length is hexadecimal 10, which is decimal 16. Note that one data byte is represented by two ASCII characters, which therefore results in 32 bytes of data.
The next 4-bytes represent the LOAD OFFSET field which specifies a 16-bit starting offset of where to load the data bytes. Since this is the first record in the file, the load offset is 0000. Obviously, the following record has a load offset of 10 (hex).
The next field specifies the record type. This RECTYP field is used to interpret the remaining information within the record. The RECTYPE of this record is “00”, which indicates a data record. Valid record types are:
’00’ Data Record
’01’ End of File Record
’02’ Extended Segment Address Record
’03’ Start Segment Address Record
’04’ Extended Linear Address Record
’05’ Start Linear Address Record
The next 32 bytes are the actual machine code bytes of the program. This is the data that is loaded into the Arduino memory.
The last field ‘CA’ is a checksum, followed by the ASCII carriage return/line feed characters “OD OA”.
Here we see the data from the first record, which I divided up into 4-byte chunks, followed by a disassembly of the program (it’s the interrupt vector table):
0C945C00 0C946E00 0C946E00 0C946E00 00000000 <__vectors>: 0: 0c 94 5c 00 jmp 0xb8 ; 0xb8 <__ctors_end> 4: 0c 94 6e 00 jmp 0xdc ; 0xdc <__bad_interrupt> 8: 0c 94 6e 00 jmp 0xdc ; 0xdc <__bad_interrupt> c: 0c 94 6e 00 jmp 0xdc ; 0xdc <__bad_interrupt>
File Size Math
So, each of the program’s 1030 bytes is stored in two-byte ASCII format, which requires a total of 2060 bytes. These 2060 bytes are stored in 32-byte sections (2060 / 32 = 64.375 sections), which each have 9-bytes of header and 4-bytes of footer appended to them (65 * 13 = 845). Finally, a 13-byte “end of file record” follows. Adding all of this together yields 2918 (2060 + 845 + 13).
Other File Formats
Obviously, there are more file formats than just “ihex” used by the AVR architecture. Many of these formats are produced or used by the AVRDUDE program. A few notable ones are, raw binary (little-endian byte order, in the case of the flash ROM data), binary, Motorola S-record, and ELF. If you want to know more, Google is your friend.