Archive for November, 2009


Programming in the C/C++ realm: Identifier names to avoid – or not?

The C family of languages has been around for a while, has evolved, grown, sprouted a few branches and now provides several paths for ‘growth’. Starting out in the seventies, standardized in the eighties, C++ in the nineties, standardized even more recently, industry-standard variants available for embedded (C++), and new standardization efforts on the way. Wonderful!
Currently, just as many times before, I’m working on a project with fairly high safety requirements, programmed in C. Not C99, but C89 with a few compiler extensions for embedded programming. Not because C++ or at least C99 is not available, but because of the inertia of legacy code, and the experience level of the (otherwise highly skilled) developers. You don’t turn a team of experts in C into a team of experts in C++ within a few weeks, and the project’s goals take priority always.
A few weeks ago I was looking at a piece of code like this:
ui32 calculate_difference(ui32 old, ui32 new) { ... }
That gave me pause. Yes, it’s perfectly OK for C89, even C99 will not complain. But do I really want to use a C++ keyword in my C code? If ever I get to move to C++, my code must break!
Having been a brief and passing guest at the C++ WG21 July 2009 meeting in Frankfurt, I started off on some research. (This list is just for illustration, not for completeness)

  • An interesting keyword from the old days has been forgotten since C89: entry
  • C89 introduced a new keyword void
  • AMD1 introduced some new semi-keywords: or_eq, and_eq. These are not keywords in the pure sense, but you might still want to avoid them as identifier names
  • C99 added at least one more: restrict
  • C++ adds a whole army of keywords: static_cast, template, class
  • The coming C/C++ standards will add several more: _Complex, _Imaginary

So I’m now trying to find out answers to the following question: What identifier names would I want to avoid in my code, considering the vast realm of C/C++ language variants and dialects. So I started off with the keywords of all standards; then I looked into the coming standards, then I looked at the might-be keywords, packaged as macros, like complex, imaginary, decimal64, etc. The list grew, I lost my overview, however slim it had been, and I didn’t even start on the various (Standard!) libraries!
I’m now starting on a more structured approach. A quick tweet didn’t get me any response. Many overviews on the net present parts, but nothing comprehensive as far as I’ve seen, so I guess I’ll have to roll my own.
If you happen to know a source of information on this, please do let me know, either as comment, or by email. I’d be more than interested.
And whether you do or don’t, you’re welcome to come back here occasionally, as I’ll be reporting my findings.
Happy coding!

Johan


Bizarre switch statement in C

It’s been in my “Bizarre C” box for many years. Things you may want or need to know about, but never would want to duplicate.

Currently, I am researching for a new publication, maybe a PDF document, maybe a course, on the nooks an crannies of the C programming language. And I’m finding some abstruse samples, I can tell you.

But one example I found most bizarre at the time (around 1990), and which still is among my favorites to flabbergast experienced C programmers, I found in the best C programming reference I’ve known to date: “C – A Reference Manual” by Samuel P. Harbison and Guy L. Steele (see also the book’s website).

Imagine a calculation depending on certain numbers being prime or not. If the routine gets a prime parameter it executes one routine, if the parameter is not prime, another. Like this:
if (is_prime(a))
··process_prime(a);
else
··process_nonprime(a);

The if-statement is just that, a statement, a composite statement.
A switch-statement is composed of a switch keyword, the value on which to operate, and a, possibly composite, statement. It could be our if-statement, if we so choose. However, without any case/default-labels, the switch-statement would just jump over the statement so specified, effectively doing nothing. To remedy that, we put the default-label before the if-statement:
switch (a)
··default:
····if (is_prime(a))
······process_prime(a);
····else
······process_nonprime(a);

This code already looks strange, but it is functionally equivalent to the if-statement by itself.
If we now imagine the function is_prime(a) to be very expensive, it would make sense to take a shortcut around that function wherever possible. And if 99% of the values the variable a can have lie between 2 and 10 inclusive, it definitely would make sense to circumvent the is_prime() function, since we know the primeness of those values without calculation:
switch (a)
··default:
····if (is_prime(a))
··case 2: case 3: case 5: case 7:
······process_prime(a);
····else
··case 4: case 6: case 8: case 9: case 10:
······process_nonprime(a);

To understand why this is correct C, we need to realize that case-labels belonging to a switch-statement can be positioned anywhere within the boundaries of the composite statement belonging to the switch. In our case, this means until the semicolon after the process_nonprime(a) statement.
It works as follows:

  • If a is 3 upon execution of the switch, the label case 3: is where execution continues after determining the value of a to be three, jumping into the middle of the if-statement and calling the function process_prime with a parameter 3. After execution of that function, the if-statement is terminated, terminating the enclosing switch-statement at the same time.
  • If a is 8 upon execution of the switch, the label case 8: is where execution continues after determining the value of a to be eight, jumping into the middle of the if-statement and calling the function process_nonprime with a parameter 8. After execution of that function, the if-statement is terminated, terminating the enclosing switch-statement at the same time.
  • If a is 29 upon execution of the switch, the label default: is where execution continues after determining that no known value is to be processed, executing the if-statement from its beginning. Depending on the value of a (29 in this case) the if-condition determines which of the two alternative functions to call, just like a normal if-statement.

I can only agree with Sam Harbison and Guy Steele: “This is frankly the most bizarre switch statement we have ever seen that still has pretenses to being purposeful.”
Would you want the nuclear reactor around the corner being coded like this?
Happy coding!

Johan