Johan Bezem ist Mitglied bei
Archive for the ‘General development’ Category
Finally: A list of C (and C++) keywords including C11 (and C++11)
It’s been a while, but now I have some news: Since I didn’t receive much response to requests in StackOverflow nor on my blog, I decided to create such a list myself. The preliminary output can be found here, as a PDF as usual. It’s version 0.4 for now, and not fully complete, but I’ve added several sources, from pre-K&R to C11, and from C++ 1st edition (1985), to C++11.
If you find inaccuracies, please let me know, either as a comment, or by email. I intend to bring out updated editions whenever new info warrants it.
Programming in the C/C++ realm: Identifier names to avoid – or not?
The C family of languages has been around for a while, has evolved, grown, sprouted a few branches and now provides several paths for ‘growth’. Starting out in the seventies, standardized in the eighties, C++ in the nineties, standardized even more recently, industry-standard variants available for embedded (C++), and new standardization efforts on the way. Wonderful!
Currently, just as many times before, I’m working on a project with fairly high safety requirements, programmed in C. Not C99, but C89 with a few compiler extensions for embedded programming. Not because C++ or at least C99 is not available, but because of the inertia of legacy code, and the experience level of the (otherwise highly skilled) developers. You don’t turn a team of experts in C into a team of experts in C++ within a few weeks, and the project’s goals take priority always.
A few weeks ago I was looking at a piece of code like this:
ui32 calculate_difference(ui32 old, ui32 new) { ... }
That gave me pause. Yes, it’s perfectly OK for C89, even C99 will not complain. But do I really want to use a C++ keyword in my C code? If ever I get to move to C++, my code must break!
Having been a brief and passing guest at the C++ WG21 July 2009 meeting in Frankfurt, I started off on some research. (This list is just for illustration, not for completeness)
- An interesting keyword from the old days has been forgotten since C89:
entry - C89 introduced a new keyword
void - AMD1 introduced some new semi-keywords:
or_eq,and_eq. These are not keywords in the pure sense, but you might still want to avoid them as identifier names - C99 added at least one more:
restrict - C++ adds a whole army of keywords:
static_cast,template,class - The coming C/C++ standards will add several more:
_Complex,_Imaginary
So I’m now trying to find out answers to the following question: What identifier names would I want to avoid in my code, considering the vast realm of C/C++ language variants and dialects. So I started off with the keywords of all standards; then I looked into the coming standards, then I looked at the might-be keywords, packaged as macros, like complex, imaginary, decimal64, etc. The list grew, I lost my overview, however slim it had been, and I didn’t even start on the various (Standard!) libraries!
I’m now starting on a more structured approach. A quick tweet didn’t get me any response. Many overviews on the net present parts, but nothing comprehensive as far as I’ve seen, so I guess I’ll have to roll my own.
If you happen to know a source of information on this, please do let me know, either as comment, or by email. I’d be more than interested.
And whether you do or don’t, you’re welcome to come back here occasionally, as I’ll be reporting my findings.
Happy coding!
Johan
Bizarre switch statement in C
It’s been in my “Bizarre C” box for many years. Things you may want or need to know about, but never would want to duplicate.
Currently, I am researching for a new publication, maybe a PDF document, maybe a course, on the nooks an crannies of the C programming language. And I’m finding some abstruse samples, I can tell you.
But one example I found most bizarre at the time (around 1990), and which still is among my favorites to flabbergast experienced C programmers, I found in the best C programming reference I’ve known to date: “C – A Reference Manual” by Samuel P. Harbison and Guy L. Steele (see also the book’s website).
Imagine a calculation depending on certain numbers being prime or not. If the routine gets a prime parameter it executes one routine, if the parameter is not prime, another. Like this:
if (is_prime(a))
··process_prime(a);
else
··process_nonprime(a);
The if-statement is just that, a statement, a composite statement.
A switch-statement is composed of a switch keyword, the value on which to operate, and a, possibly composite, statement. It could be our if-statement, if we so choose. However, without any case/default-labels, the switch-statement would just jump over the statement so specified, effectively doing nothing. To remedy that, we put the default-label before the if-statement:
switch (a)
··default:
····if (is_prime(a))
······process_prime(a);
····else
······process_nonprime(a);
This code already looks strange, but it is functionally equivalent to the if-statement by itself.
If we now imagine the function is_prime(a) to be very expensive, it would make sense to take a shortcut around that function wherever possible. And if 99% of the values the variable a can have lie between 2 and 10 inclusive, it definitely would make sense to circumvent the is_prime() function, since we know the primeness of those values without calculation:
switch (a)
··default:
····if (is_prime(a))
··case 2: case 3: case 5: case 7:
······process_prime(a);
····else
··case 4: case 6: case 8: case 9: case 10:
······process_nonprime(a);
To understand why this is correct C, we need to realize that case-labels belonging to a switch-statement can be positioned anywhere within the boundaries of the composite statement belonging to the switch. In our case, this means until the semicolon after the process_nonprime(a) statement.
It works as follows:
- If
ais 3 upon execution of theswitch, the labelcase 3:is where execution continues after determining the value ofato be three, jumping into the middle of theif-statement and calling the functionprocess_primewith a parameter 3. After execution of that function, theif-statement is terminated, terminating the enclosingswitch-statement at the same time. - If
ais 8 upon execution of theswitch, the labelcase 8:is where execution continues after determining the value ofato be eight, jumping into the middle of theif-statement and calling the functionprocess_nonprimewith a parameter 8. After execution of that function, theif-statement is terminated, terminating the enclosingswitch-statement at the same time. - If
ais 29 upon execution of theswitch, the labeldefault:is where execution continues after determining that no known value is to be processed, executing theif-statement from its beginning. Depending on the value ofa(29 in this case) theif-condition determines which of the two alternative functions to call, just like a normalif-statement.
I can only agree with Sam Harbison and Guy Steele: “This is frankly the most bizarre switch statement we have ever seen that still has pretenses to being purposeful.”
Would you want the nuclear reactor around the corner being coded like this?
Happy coding!
Johan
Why 32768 isn’t always the same as 0×8000
Contrary to intuition, the C constants ’32768′ and ’0×8000′ have an
identical representation (0×8000), but possibly different types in C.
If you consider a processor with a 16-bit int type, and a 32-bit long
type, 32768 is considered long, whereas 0×8000 (and the octal variant
0100000) is considered ‘unsigned int’.
If you feel the need, check the C standard at
http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf, page 55 at the
bottom, and the table at page 56. (look at Harbison & Steele, 5th edition, and look at section 2.7.1 page 24ff).
Normally, there is no problem when using such values, since the
representations are identical.
However, consider this small example:
#define C_DECIMAL 32768
#define C_HEXADECIMAL 0x8000
void main(int argc, char *argv[])
{
volatile long long_dec = ((long)~C_DECIMAL);
volatile long long_hex = ((long)~C_HEXADECIMAL);
return;
}
When C_DECIMAL is considered long, the negation will invert 32 bits,
resulting in a representation 0xFFFF7FFF with type ‘long’; the cast is
superfluous.
When C_HEXADECIMAL is considered ‘unsigned int’, the negation will invert
16 bits, resulting in a representation 0x7FFF with type ‘unsigned int’;
the cast will then zero-extend to a ‘long’ value of 0x00007FFF.
Checking with a 16-bit integer compiler (CW7.1 ColdFire using ‘-intsize 2′):
0x00000000 _main: ; main: 0x00000000 0x4E560000 link a6,#0 0x00000004 0x518F subq.l #8,a7 0x00000006 0x223CFFFF7FFF move.l #-32769,d1 0x0000000C 0x2D41FFF8 move.l d1,-8(a6) 0x00000010 0x223C00007FFF move.l #32767,d1 0x00000016 0x2D41FFFC move.l d1,-4(a6) 0x0000001A 0x4E5E unlk a6 0x0000001C 0x4E75 rts
For those of you who do not know how to read assembler code I have made the differing values italic. So the compiler confirms the difference in behavior, and this is not a compiler error.
Lucky you if you have Lint to warn you. (Yes, I know, other tools will too, if you let them…)
Happy coding!
Two new articles: Physical Architecture and Coding Conventions
I just added two articles, again, the basics of which have been written some time ago, but now I revamped them and put them on the website.
And since the number of articles (or: white papers) has grown, I have dedicated a new page to these “white papers”.
I now express my opinion on physical architecture (in contrast to the logical software architecture everyone learns when trying to learn programming), and on the almost religious topic of coding conventions. Have a look at the new white paper page!
Johan
Deutsch
English
Nederlands