Here is a link to a number of suggestions I have compiled for hardening of firmware.
I'm pretty sure that a lot can be said about the list, so please post coding tips or links to pages with good information of software hardening.
iapetus.neab.net/.../hardening.html
Hardening Code. Hmmm. This is where my Rule-book Against [Radical] Code-Monkeys comes into play. Its mine, and you can't have it...
* ---------------------------------------------------------------------- * WARNING - This document contains technical data whose export is * restricted by the Arms Export Control Act (Title 22, U.S.C., Sec 2751 * et seq.) or the Export Administration Act of 1979, as amended, Title * 50 U.S.C., App 2401 et seq. Violation of these export laws are subject * to severe criminal penalties. Dissemination in accordance with * provisions of DOD Directive 5230.25 (Reference C). * ----------------------------------------------------------------------
... thus I can't give it to you anyway.
Unfortunately, too many code monkeys think they're good software engineers because they also think they're good at doing the code-monkey work.
Per, I know you know this stuff, but for those hobbyists out there, and embedded professionals who jump right in and code themselves up a mini-van (Delbert Reference there), coding is only a SMALL part of the hardening process.
If a failure occurs the following list of things should be re-visited:
The Application or Product Concept is wrong - Use Cases, Alternate Courses of Actions, Actors, Environment, pre and post conditions are in error. - The high level expectations were wrong
The Translations were wrong - Ambiguities in the language or mis-interpretations occurred somewhere in the chain of design documents occurred - An incomplete set of information - Requirements fell through the cracks - Your UML or whatever modeling tool failed to capture the higher level needs
The data-flows are wrong - Data elements: range, units, update-rates, saturation, overflows - Accessors/Mutators aren't vetted properly - Source & Sink errors - Model (UML) errors - Interrupt information handling - Semaphore/Task exchanges
The "SRS" does not capture the expected behaviors - The SRS cross-checking and system requirements matrix is incomplete - SRS to sub-document links are broken: SDDs, etc
The Algorithms are wrong - "Code Review" style failed to catch the error - "PC" simulations of the algorithms aren't right or were not translated into the proper format for the target application
Coding Errors (that is YOU Mr. Code-Monkey) (this is where most 'software engineers' live and breath and don't do the above groundwork prior to 'hacking' the code together)
- Language Stupidity errors (e.g. violation of the standards, etc) - Code does not comply with the requirements & algorithms - Stupid Errors (the typical "human errors" like misspelling, forgetting, etc) - [un] Clarity of code that leads to hiding artifacts - lack of experience (e.g. embedded techniques)
(and this ends most of the hobbyist's and semi-pro engineering effort)
Platform Errors - System Architecture (whole schematic, or just CPU, or both, or more) does not support the requirements expected (e.g. marginal memory space, insufficient MIPS, etc) - Code vs Platform misunderstandings on how it really works
Component Specification Errors - Data-sheets are wrong - Data-sheets are misinterpreted - Vagueness in both data-sheets and your understanding of the component
Tool-Chain Errors - IDE has errors in translation - Code-Monkey misapplied the Tool-Chain
Configuration Management - Revision-itis - Stable Hardware-In-the-Loop (HIL): parts not swapped out with slightly different subsystems, etc.
Test-Methods Errors - The chosen methods of testing did not capture the potential errors - Failed to account for X, Y, or Z requirements or conditions
END OF Ad-Hoc LIST
Much of this stuff is covered in a SQEM (Software Quality Engineering Manual) that your company had developed... typically as a result of Mil/Aero/FDA standards.
In a lean company, the full-boat Mil/Aero/FDA standards takes too much time and resources, so a slimmed down SQEM that captures the intent of the Mil/Aero/FDA standards is used. These SQEMs are usually reviewed by the contracting party for compliance, and are stamped as "Acceptable" or not.
After you do all of the up-front [paper] work, you will find that the Code-Monkey stage is actually very trivial and very easy.
The process is very iterative, so Mr. CodeMonkey must not only understand the documents, detect any problems, but also up-flow any deviations as required.
IF I MIGHT SUGGEST... If you haven't done a *real* embedded system with the full SQEM documentation, I think you should do it on a small project. Do something simple like your own version of Blinky. It would be good for you to fully spec the system before grabbing the bowl of plantains and banging on the keyboard.
Per, (BTW 404 on that link---but over the weekend I saw that the link worked, but didn't have the time to read it)
So when you speak of code "hardening" I think you mean the methods used during the Code-Monkey stages. And there is plenty out there to do in that respect.
The best piece of advice I can quickly offer is to think about what your code is doing, and be as CLEAR as possible when doing it... lots of white-space, lots of comments, lots of attention to detail.
Look into how your tool-chain is generating the code. We/I used the 0 vs 1 TRUE vs FALSE example of avoiding the 1-bit away from the other state. Your CPU will have instructions that depend upon single-bit states (I can't think of any reason to expect otherwise), but you have the ability to generate code that doesn't use the "JB Label" and uses the "CNJE R0,#A5,Label" instead... (but optimization may wipe that out). Either way, you need to look into what gets generated, and how/why it occurs that way, then figure out if it is solid for your needs. Sacrificing CPU/MIPS for the sake of safety is a good trade.
Good code-monkeys know whats going on, and the best 'hardening' at the code-monkey stage is based upon this knowledge.
--Cpt. Vince Foster 2nd Cannon Place Fort Marcy Park, VA
When did you get the 404? That address is on a radio link, but I didn't know that it had suffered any disturbances. It normally has quite good availability.
An alternative address (different Internet provider) is: 89.160.33.66/.../hardening.html
Yes, I did mean the code-monkeying stage. As I did write in the infinite thread, most links on Google is about the formal stages. Deciding what to build. How to figure out that what you build matches the intended blueprint. How to quality the tools. How to decide how much - and what kind of - testing. How to document the code and the tests. How to cross-correlate requirements with implementation.
Less seems to be available when reaching the actual coding stage, with exception for the "traditional" engineering practices of structuring etc. What I was thinking about whas more situations where a one-year uptime shouldn't be seen as a high-score but the normal condition, and the alternative a failure with a good (?) potential for a customer support call on the way in.
Try checking out Jack Ganssle's articles...
http://www.ganssle.com
Although I don't agree with every single thing he espouses (for what I do), he is a good resource for embedded things. (I have not read his books).
I've exchanged information with him. He is a nice guy, and knows his stuff. ... just don't bother him if you're an idiot.
"Try checking out Jack Ganssle's articles..."
Vince,
That's an interesting looking link - Thanks, I might even contact him.
See - Your posts don't have to be large to be useful ;)
Oh, but maybe I should hold fire ... me being an idiot and having no experience blah, blah, blah :(
Go for it, to let us see if he is as dangerous as promised :)
Try mailing Jack and ask him how he garbage-collect the defragmented holes on an ordinary heap where the allocations are done with malloc().
It is often quite easy - with some implementations trivial - to merge adjacent free blocks, but to garbage-collected them is no fun since the allocated and used memory blocks has to be moved to remove the holes in-between. malloc() don't allow a heap manager to find out what pointers to adjust if an allocated block is moved, unless the processor can send out handles instead of pointers. The segmented protected mode introduced in the 286 would probably be able to do it, but not without troubles. Not sure if there exists any memory manager making use of that feature since most of the world dropped interest in the 16-bit segmented protected mode within the first weeks of the 386 introduction.
Suddenly, the new software stands there on a shelf in in the shop. The shrink-wrapped box with the unreadable manual and all juridical notes about disclaimers and limitation of liabilities is the result of careful work and rigid quality control. This is what the development cycle looks like:
1) The developer creates what he thinks is bug-free code.
2) The program is tested and 20 bugs found.
3) The developer fixes 10 of the bugs, and declares that the remaining are not really bugs.
4) The test department realizes that 5 of the fixes does not work, and finds 15 new bugs.
5) See point 3.
6) See point 4.
7) See point 5.
8) See point 6.
9) See point 7.
10) See point 8.
11) After pressure from the market department, and a too early released press release based on an overoptimistic time schedule, the program is relesed.
12) Early users finds a further 137 bugs.
13) The developer, who have had his final invoice payed can't be reached.
14) A quickly gathered team of developers fixes most of the bugs, while in the progres introducing 456 new.
15) The test department gets a postcard from Bali from the first developer. All testers quits.
16) The company is bought by a competitor for the profit of their last program release, that had 783 bugs.
17) The board recruits a new president that employs a developer ready to develop a new program from scratch.
18) The developer creates what he thinks is bug-free code. See 2.
Let me guess his response: "I make sure I never get into such a situation".
The reason I did pick that example was because Jack has a pre-chewed policy document for how to develop software (where to store source, how to indent, ...) on his page. One of the things it mentions is to be careful about malloc() defragmentation and to check if malloc() has support for garbage-collect of free space.
It can be done using dual indirection (segmented MMU like in 286 or explicit handles like used in several OS from before MMU) or by instrumenting the source code to allow the stack and data structures to be walked - 16-bit Windows for example has special progolgue/epilogue for all functions, so that Windows can patch the return addresses on the stack when a memory block has been moved.
But I haven't seen anyone trying to take this to the embedded world because of the big problems with deterministic response times. So I wonder if Jack did add that part by accident, or if he really had specific usage cases with such memory managers in the embedded world.
In the end, it is normally way easier to write the application to use fixed-size memory blocks that can be allocated/released without any fragmentation. Most RTOS has thread-safe allocation functions that can work with arrays of such memory blocks. And when the requirements gets higher, or the work with specifically rewriting the code is decided to increase the complexity too much, then a processor with MMU and paged memory will work well, as long as all dynamic allocations are made for a number of pages instead of using a variable-size scheme like malloc() normally uses.
Even worse.
I have been seeing a lot of people (not really a lot, but enough), who even don't know what an interrupt service routine or a Linux kernel module is, are "MODIFY"ing an interrupt service routine or a Linux kernel module.
And those people think that I (John Linq) is an idiot. (At least, they make me feel I am an idiot.)
Why so serious? It is the human society geting weaker and weaker.
(I have never worked on an area, where Human Safty is an issue that has to be concerned.)
Per, You posted: The reason I did pick that example was because Jack has a pre-chewed policy document for how to develop software (where to store source, how to indent, ...) on his page
What "page"? Do you mean this forum or can you post a link...?
It was this link: http://www.ganssle.com/fsm.pdf
You can find it from Special Reports/A Firmware Standard.
What made me interested was page 6, "Stack and Heap Issues", third section. I was thinking that there might be an interesting story hidden there somewhere - the concept of garbage-collecting fragmented blocks is not something I would expect in any embedded world because of all the problems. And in a non-embedded world it would most probably not be needed.
Holy cow, I thought you meant "Jack Sprat" :-) That Jack I do know...
LOL :)
No, I was just "donating" a question for Silly Sausage to ask, and then let us see if he would become a breakfast sausage or if there would come something interesting out of it :)
I just have to ask: How could you even think about Jack Sprat, given how silent he is between the times when he decides to pick a fight?
Cpt. Vince said it just a few days ago:
"And of course I think of you: I'm sure we all do."