(If you want to understand why, see the later post on keeping the bootstrap code scure.)
First, your bootsrap OS has to be a bit more complete than BIOSses have tended to be in the past. Closer to Open Firmware, but useable by the average moderately-technical user and the average support-guy at the local shop.
You have to have four images:
- Working image,
- Backup of latest working image,
- Archive image,
- Fallback, or fail-safe image.
The working image is the one you normally boot. It boots your normal operating system. If the hardware allows the working image to be re-programmed from the normal OS, the normal OS must only provide access to the re-programming features in a special administrator mode that requires being re-booted to.
(Getting or compiling an updated bootstrap image is a separate topic that I will try to rant about later.)
The backup of the latest working image is never booted. It's basically there because a good cryptographic checksum cannot guarantee perfectly that something hasn't been inserted by a very clever mathematically inclined attacker. It's for checking the bootstrap before the bootstrap is allowed to continue.
(Yes, that means a pre-boot boot. Naturally.)
The backup is also checked against the current working image before re-programming is allowed.
Then, after you update the bootstrap code, and run some integrity/security tests, reboot, and run some more integrity tests, before the normal OS is called, a new bootstrap will copy itself to the backup.
The archive is also never booted. It must be physically impossible to write to it from either the normal bootstrap or the normal OS.
The administrator will set a period, a week or two, or a month, after which a grandfather backup will be scheduled, and the pre-boot bootstrap will copy the backup to the archive. (The waiting period is to leave enough time that the bootstrap can be assumed stable.)
The fallback image, which includes the pre-boot bootstrap, must be physically impossible to write to, period. It's there for when all else has failed. It will include some command-line and (simple) menu-driven tools for testing, debugging, hunting for malware, etc.
There must be a physical button, switch, or electrical strap that will force booting to stop and wait at a command-line or menu instead of proceeding to the normal OS. In addition, an administrator tool should be provided for the normal OS, which directs the next boot to stop at the bootstrap level.
Another button, switch, or strap will direct bootup to the fallback.
Among the commands available will be one to get a new bootstrap (working) image from the manufacture, over the network, or from some removable media. Another will provide for updating the kernel and lowest-level utilities of the normal OS without having to start any image of the normal OS.
In a brand-new, fresh-from-the-factory motherboard or system, all four images will be identical.
So, what about the normal OS?
A similar approach might be useful in updating normal OS and application code, as well.
Some code, such as the kernel, would do well to have full multiple copies for backup. Others, mostly end-user applications, might be okay with only good checksums, but I would be inclined to use full copy backup for any mission-critical application.
If four copies of every app is overkill, two copies and a good checksum would be a next best alternative. (And preferably, don't let the application updater directly overwrite the checksums.)
[JMR201704211138: I had some further thoughts on the low-level boot process, which might be interesting: http://defining-computers.blogspot.com/2017/04/model-boot-up-process-description-with.html.]