Johan Bezem ist Mitglied bei
Posts Tagged ‘static syntax checking’
Bizarre switch statement in C
It’s been in my “Bizarre C” box for many years. Things you may want or need to know about, but never would want to duplicate.
Currently, I am researching for a new publication, maybe a PDF document, maybe a course, on the nooks an crannies of the C programming language. And I’m finding some abstruse samples, I can tell you.
But one example I found most bizarre at the time (around 1990), and which still is among my favorites to flabbergast experienced C programmers, I found in the best C programming reference I’ve known to date: “C – A Reference Manual” by Samuel P. Harbison and Guy L. Steele (see also the book’s website).
Imagine a calculation depending on certain numbers being prime or not. If the routine gets a prime parameter it executes one routine, if the parameter is not prime, another. Like this:
if (is_prime(a))
··process_prime(a);
else
··process_nonprime(a);
The if-statement is just that, a statement, a composite statement.
A switch-statement is composed of a switch keyword, the value on which to operate, and a, possibly composite, statement. It could be our if-statement, if we so choose. However, without any case/default-labels, the switch-statement would just jump over the statement so specified, effectively doing nothing. To remedy that, we put the default-label before the if-statement:
switch (a)
··default:
····if (is_prime(a))
······process_prime(a);
····else
······process_nonprime(a);
This code already looks strange, but it is functionally equivalent to the if-statement by itself.
If we now imagine the function is_prime(a) to be very expensive, it would make sense to take a shortcut around that function wherever possible. And if 99% of the values the variable a can have lie between 2 and 10 inclusive, it definitely would make sense to circumvent the is_prime() function, since we know the primeness of those values without calculation:
switch (a)
··default:
····if (is_prime(a))
··case 2: case 3: case 5: case 7:
······process_prime(a);
····else
··case 4: case 6: case 8: case 9: case 10:
······process_nonprime(a);
To understand why this is correct C, we need to realize that case-labels belonging to a switch-statement can be positioned anywhere within the boundaries of the composite statement belonging to the switch. In our case, this means until the semicolon after the process_nonprime(a) statement.
It works as follows:
- If
ais 3 upon execution of theswitch, the labelcase 3:is where execution continues after determining the value ofato be three, jumping into the middle of theif-statement and calling the functionprocess_primewith a parameter 3. After execution of that function, theif-statement is terminated, terminating the enclosingswitch-statement at the same time. - If
ais 8 upon execution of theswitch, the labelcase 8:is where execution continues after determining the value ofato be eight, jumping into the middle of theif-statement and calling the functionprocess_nonprimewith a parameter 8. After execution of that function, theif-statement is terminated, terminating the enclosingswitch-statement at the same time. - If
ais 29 upon execution of theswitch, the labeldefault:is where execution continues after determining that no known value is to be processed, executing theif-statement from its beginning. Depending on the value ofa(29 in this case) theif-condition determines which of the two alternative functions to call, just like a normalif-statement.
I can only agree with Sam Harbison and Guy Steele: “This is frankly the most bizarre switch statement we have ever seen that still has pretenses to being purposeful.”
Would you want the nuclear reactor around the corner being coded like this?
Happy coding!
Johan
Why 32768 isn’t always the same as 0×8000
Contrary to intuition, the C constants ‘32768′ and ‘0×8000′ have an
identical representation (0×8000), but possibly different types in C.
If you consider a processor with a 16-bit int type, and a 32-bit long
type, 32768 is considered long, whereas 0×8000 (and the octal variant
0100000) is considered ‘unsigned int’.
If you feel the need, check the C standard at
http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf, page 55 at the
bottom, and the table at page 56. (look at Harbison & Steele, 5th edition, and look at section 2.7.1 page 24ff).
Normally, there is no problem when using such values, since the
representations are identical.
However, consider this small example:
#define C_DECIMAL 32768
#define C_HEXADECIMAL 0x8000
void main(int argc, char *argv[])
{
volatile long long_dec = ((long)~C_DECIMAL);
volatile long long_hex = ((long)~C_HEXADECIMAL);
return;
}
When C_DECIMAL is considered long, the negation will invert 32 bits,
resulting in a representation 0xFFFF7FFF with type ‘long’; the cast is
superfluous.
When C_HEXADECIMAL is considered ‘unsigned int’, the negation will invert
16 bits, resulting in a representation 0×7FFF with type ‘unsigned int’;
the cast will then zero-extend to a ‘long’ value of 0×00007FFF.
Checking with a 16-bit integer compiler (CW7.1 ColdFire using ‘-intsize 2′):
0x00000000 _main: ; main: 0x00000000 0x4E560000 link a6,#0 0x00000004 0x518F subq.l #8,a7 0x00000006 0x223CFFFF7FFF move.l #-32769,d1 0x0000000C 0x2D41FFF8 move.l d1,-8(a6) 0x00000010 0x223C00007FFF move.l #32767,d1 0x00000016 0x2D41FFFC move.l d1,-4(a6) 0x0000001A 0x4E5E unlk a6 0x0000001C 0x4E75 rts
For those of you who do not know how to read assembler code I have made the differing values italic. So the compiler confirms the difference in behavior, and this is not a compiler error.
Lucky you if you have Lint to warn you. (Yes, I know, other tools will too, if you let them…)
Happy coding!
PolySpace – pros and cons
I have some experience with a static (or quasi-dynamic) code-checker named PolySpace.
And since several requests have reached me to comment on the usefulness of PolySpace, sometimes in comparison with PC Lint, I decided to share some of my experiences with my (hopefully large) audience.
I used version 2.6 IIRC more than two years ago. However, the observations noted here were written down less than a year after that, and backed-up with a complete experience report written in parallel to my activities as a sort of diary. Memory lapses therefore unlikely.
I have made following observations on the deployment and use of PolySpace for embedded systems:
- PolySpace starts its operation with a GNU-compiler run locally on the users machine, and being far more picky than most compilers:
- Macro redefines (without#undef) are not allowed
- An unused macro parameter is not allowed
- Function call without parentheses is not allowed (this is a real error, resulting in a function pointer; however, some compilers just warn, and “do the right thing”, like the GreenHills compiler)
- a macroassertis not allowed, since this would redefine an ANSI macro/function
- Too many braces in struct-initializers are not allowed, i.e.int myArr[2] = { { 5, 1 } };
- Bit fieldunsigned long bf:32is not allowed; justunsigned int bf:32; for widths less than 32 (long size?), no such limitation seems present
- Include filenames with a trailing space are not allowed, i.e.#include "abc.h "
- All prototypes must be available, correct and visible on use
- Inlining can be tedious, especially wheninlineis implicitly taken to meanstaticas with some compilers possible. Be aware: PolySpace warnings/errors in these cases are in no way always to the point and useful. In several cases it has proven necessary to dig quite deep into PolySpace innards and/or to refer to PolySpace support in order to understand why processing wasn’t successful. - Several concepts are tedious to accommodate with PolySpace, and may lead to red errors (errors stopping the processing at that point)
- All variables must be initialized in a way recognizable for the GNU compiler
- Backwardgoto’s must be instrumented with special PolySpace macros (changes in source code unavoidable!)
- Some forms of assembler need to be stubbed, since just disabling the assembly code may lead to a missing or empty function
-assert(false)orassert(0)is not allowed, not even in conditional code, and leads to red errors, stopping the processing of the rest of the function
- Functions known not to terminate must be conscientiously indicated
- All task entry points (in addition tomain) must be indicated, as well as the ISR entry points
- Only functions with the prototypevoid <function>(void)are allowed as entry points for PolySpace
- Assignment of enum values to bit fields (even if the value range basically fits) can only be performed masked to the width of the bit field, even if the bit field has been declared with the enum-type as base
- If specific input conditions are implicitly assumed, and violation of those could/would lead to red/abort errors in PolySpace, these conditions must be made explicit by using asserts - Aside from the long running times required (more than four days on ‘Software Safety Analysis level 2′ with ‘-O2′), the welcomed option “continue-with-red-errors”, which enables one to find multiple aborting errors in one PolySpace run, may cause spurious red errors also, and makes it tedious to find the source of a series of red errors
- Deployment: If the client version and the server version differ even in minor parts of the version number, finding the cause of inevitable failures is very difficult, since the only clear-text error message is deposited in a cryptic log file normally ignored as part of the PolySpace noise
- Deployment: PolySpace on Windows needs its own version of the Cygwin environment; one cannot install PolySpace without also installing Cygwin. Since it is problematic to have more than one Cygwin installation on one machine, other tools and environments cannot be used on the same machine. To my experience this includes the Standard Core development environment from BMW/3SOFT as well as the testing tool Tessy, among others. And since PolySpace will not run with a different version of Cygwin, this might limit its deployment options. Full runs on a (Linux?) server are possible only if and when the server has access to the original sources, through Samba, FTP, NFS, or whatever. Depending on the regular development environment, this may be tedious but possible.
- Most important in my opinion: Mutual exclusion primitives must always appear in pairs within the same scope, like e.g. DI/EI. However, with an operating system like OSEK or VxWorks (or similar), sometimes constructs exist that do not appear in pairs (like ‘osSaveDisableGlobal’ only uses ‘DI’, whereas ‘osRestoreEnableGlobal’ only uses ‘EI’), or one has three primitives, two for setting and one for clearing (like ’setLocal’ and ’setGlobal’ vs. ‘clearAll’). Such combinations cannot be accommodated in
PolySpace. The idea was to add extra (empty) macros for use as PolySpace guards. However, only code reviews could ascertain that such guards had only be placed at the appropriate code lines, and nowhere else. Recommendation even from PolySpace support (!!) was to concentrate on the code reviews, and refrain from implementing such primitives in the code to have PolySpace check the multitasking accesses. Thus, I have no experience with the problematic cases PolySpace would be able to find.
I have worked with PolySpace for some three months on a project with roughly estimated 50 kLOC C code (Counted semicolons outside of comments/strings, but fair enough). I cannot assume to have seen all potential road blocks using PolySpace, as I cannot assume that all described road blocks will prove relevant for your C code. Using PolySpace for C++ was outside my scope. Therefore, these observations are just that: observations.
And please be aware: This was version 2.6, PolySpace has not halted development, and it might very well be that some of the issues mentioned here have been resolved
already.
And on the other hand, the things that PolySpace finds can be amazing: If a function creates a divide-by-zero runtime error when called with a parameter value of, say, 32,
PolySpace will find that. It will find effectively unused code, even if the conditions are far from trivial, and many other things.
OK, Lint will find much for you. If your code is Lint-clean, and both PolySpace and Lint have been properly configured, PolySpace will find few (but probably serious) errors,
if any. So, if you have the money, resources and time to use PolySpace, use both: Lint on a daily basis, available to all developers, with a clear message that code shall be clean before being checked-in. And well in advance of the next release, do a full PolySpace check, and take the warnings seriously.
Happy debugging!
Deutsch
English
Nederlands