Finally: A list of C (and C++) keywords including C11 (and C++11)

It’s been a while, but now I have some news: Since I didn’t receive much response to requests in StackOverflow nor on my blog, I decided to create such a list myself. The preliminary output can be found here, as a PDF as usual. It’s version 0.4 for now, and not fully complete, but I’ve added several sources, from pre-K&R to C11, and from C++ 1st edition (1985), to C++11.
If you find inaccuracies, please let me know, either as a comment, or by email. I intend to bring out updated editions whenever new info warrants it.


Programming in the C/C++ realm: Identifier names to avoid – or not?

The C family of languages has been around for a while, has evolved, grown, sprouted a few branches and now provides several paths for ‘growth’. Starting out in the seventies, standardized in the eighties, C++ in the nineties, standardized even more recently, industry-standard variants available for embedded (C++), and new standardization efforts on the way. Wonderful!
Currently, just as many times before, I’m working on a project with fairly high safety requirements, programmed in C. Not C99, but C89 with a few compiler extensions for embedded programming. Not because C++ or at least C99 is not available, but because of the inertia of legacy code, and the experience level of the (otherwise highly skilled) developers. You don’t turn a team of experts in C into a team of experts in C++ within a few weeks, and the project’s goals take priority always.
A few weeks ago I was looking at a piece of code like this:
ui32 calculate_difference(ui32 old, ui32 new) { ... }
That gave me pause. Yes, it’s perfectly OK for C89, even C99 will not complain. But do I really want to use a C++ keyword in my C code? If ever I get to move to C++, my code must break!
Having been a brief and passing guest at the C++ WG21 July 2009 meeting in Frankfurt, I started off on some research. (This list is just for illustration, not for completeness)

  • An interesting keyword from the old days has been forgotten since C89: entry
  • C89 introduced a new keyword void
  • AMD1 introduced some new semi-keywords: or_eq, and_eq. These are not keywords in the pure sense, but you might still want to avoid them as identifier names
  • C99 added at least one more: restrict
  • C++ adds a whole army of keywords: static_cast, template, class
  • The coming C/C++ standards will add several more: _Complex, _Imaginary

So I’m now trying to find out answers to the following question: What identifier names would I want to avoid in my code, considering the vast realm of C/C++ language variants and dialects. So I started off with the keywords of all standards; then I looked into the coming standards, then I looked at the might-be keywords, packaged as macros, like complex, imaginary, decimal64, etc. The list grew, I lost my overview, however slim it had been, and I didn’t even start on the various (Standard!) libraries!
I’m now starting on a more structured approach. A quick tweet didn’t get me any response. Many overviews on the net present parts, but nothing comprehensive as far as I’ve seen, so I guess I’ll have to roll my own.
If you happen to know a source of information on this, please do let me know, either as comment, or by email. I’d be more than interested.
And whether you do or don’t, you’re welcome to come back here occasionally, as I’ll be reporting my findings.
Happy coding!

Johan


Bizarre switch statement in C

It’s been in my “Bizarre C” box for many years. Things you may want or need to know about, but never would want to duplicate.

Currently, I am researching for a new publication, maybe a PDF document, maybe a course, on the nooks an crannies of the C programming language. And I’m finding some abstruse samples, I can tell you.

But one example I found most bizarre at the time (around 1990), and which still is among my favorites to flabbergast experienced C programmers, I found in the best C programming reference I’ve known to date: “C – A Reference Manual” by Samuel P. Harbison and Guy L. Steele (see also the book’s website).

Imagine a calculation depending on certain numbers being prime or not. If the routine gets a prime parameter it executes one routine, if the parameter is not prime, another. Like this:
if (is_prime(a))
··process_prime(a);
else
··process_nonprime(a);

The if-statement is just that, a statement, a composite statement.
A switch-statement is composed of a switch keyword, the value on which to operate, and a, possibly composite, statement. It could be our if-statement, if we so choose. However, without any case/default-labels, the switch-statement would just jump over the statement so specified, effectively doing nothing. To remedy that, we put the default-label before the if-statement:
switch (a)
··default:
····if (is_prime(a))
······process_prime(a);
····else
······process_nonprime(a);

This code already looks strange, but it is functionally equivalent to the if-statement by itself.
If we now imagine the function is_prime(a) to be very expensive, it would make sense to take a shortcut around that function wherever possible. And if 99% of the values the variable a can have lie between 2 and 10 inclusive, it definitely would make sense to circumvent the is_prime() function, since we know the primeness of those values without calculation:
switch (a)
··default:
····if (is_prime(a))
··case 2: case 3: case 5: case 7:
······process_prime(a);
····else
··case 4: case 6: case 8: case 9: case 10:
······process_nonprime(a);

To understand why this is correct C, we need to realize that case-labels belonging to a switch-statement can be positioned anywhere within the boundaries of the composite statement belonging to the switch. In our case, this means until the semicolon after the process_nonprime(a) statement.
It works as follows:

  • If a is 3 upon execution of the switch, the label case 3: is where execution continues after determining the value of a to be three, jumping into the middle of the if-statement and calling the function process_prime with a parameter 3. After execution of that function, the if-statement is terminated, terminating the enclosing switch-statement at the same time.
  • If a is 8 upon execution of the switch, the label case 8: is where execution continues after determining the value of a to be eight, jumping into the middle of the if-statement and calling the function process_nonprime with a parameter 8. After execution of that function, the if-statement is terminated, terminating the enclosing switch-statement at the same time.
  • If a is 29 upon execution of the switch, the label default: is where execution continues after determining that no known value is to be processed, executing the if-statement from its beginning. Depending on the value of a (29 in this case) the if-condition determines which of the two alternative functions to call, just like a normal if-statement.

I can only agree with Sam Harbison and Guy Steele: “This is frankly the most bizarre switch statement we have ever seen that still has pretenses to being purposeful.”
Would you want the nuclear reactor around the corner being coded like this?
Happy coding!

Johan


Using PC Lint in Eclipse – no plug-ins required

I’ve been tinkering with Eclipse to see if I can make PC Lint run from inside Eclipse. Of course, running Lint as an external tool is always an option, but I couldn’t get Eclipse to recognize the Lint warnings and display them in the ‘Problems’ tab. From my testing and what I could find in the Internet, it seems that the output of so-configured external tools is not run by the error parsers installed.

What I found in the Internet suggested that I have to use a special error parser, configure that, and then it would run. Now, I have nothing against using extra tools if that’s the way to achieve what I want, but if I can do without, I consider that a better solution, since every tool used is another tool to learn how to use. And delivering configurations to my customers,  it’s not my business model to make myself indispensable, on the contrary.

So, after experimenting some more, I have found a way to use PC Lint with Eclipse (I used Ganymede for testing) without resorting to additional tools or plug-ins. Be aware, I’m not an Eclipse expert in any sense, so there may be many more and possibly better ways to achieve this. I’d be happy to get some comments with further tips.

The key is to add a new ‘build target’ for running PC Lint:

  1. On the properties dialog for the workspace (right-click and select ‘Properties’), in the ‘Builders’ section, check that the “CDT Builder” is selected.
  2. In the ‘C/C++ Build’ section, click on “Manage Configurations…” and add a configuration for running Lint.
  3. Select the configuration from the drop-down box.
  4. On the ‘Builder Settings’ tab, deselect the “Use default build command” and instead provide the path to the Lint executable, eg. “C:\Lint90\LINT-NT.EXE”. Specify the ‘Build location’, from which directory you want Lint to run, either absolute or using the buttons offered.
  5. On the ‘Behaviour’ tab, deselect “clean”, and provide the parameters to Lint in the (selected) ‘Build (Incremental build)’ text box.
  6. Be aware that these indications all assume some kind of a general build or make environment, so the names themselves do not indicate the possibility for running a single program. If you use a Makefile concept for running Lint, you may use that as well, and ‘clean’ might even make sense.

  7. In the ‘Settings’ section of ‘C/C++ Build’, make sure that all CDT parsers (or at least the “CDT GNU C/C++ Error Parser”) are selected.

And that’s it.

OK; one final thing remains. We now need to coerce Lint into providing warnings in a format similar to the GNU C/C++ compiler. This can be achieved by some Lint options in your option file. I use:

// Output options: One line, file info always; Use full path names

-hF1

+ffn

// Normally my format is defined as follows:

//-"format=%(\q%f\q %l %C%) %t %n: %m"

// For eclipse-usage, the GCC error format is necessary,

// since we have only the default eclipse error parser available.

-"format=%(%f:%l:%C:%) %t %n: %m"

// And also for eclipse, the reference locations provided by

// Lint, put into square brackets “[Reference: File: ... Line: ...]”

// are not correctly handled, therefore we switch them off.

// Enable warning 831 if you are interested.

-frl

// Do not break lines

-width(0)

// And make sure no foreign includes change the format

+flm

If you want to know more than my comments are telling you, check the Lint manual for details.

Now, don’t get me wrong: I will not switch from my trusted SlickEdit to Eclipse. But, as a consultant, I cannot always pick and choose. And running Lint from Eclipse, having a way to jump from warning to warning, definitely beats manual navigation.

Happy Linting!


Virtualisation, VMWare, SysPrep – not as easy as it seems

If you know me, you know I am a Windows power-user, but certainly not an experienced administrator. If you are then confronted with a potential roll-out for a VMWare-based development environment to be legally cloned from a master environment using XP, things seem easy at first: Install, copy, distribute.
If you then look into things like hostnames, WinXP activiation, simultaneous network access, things become more complicated. And if you then start looking into something like SysPrep (not for Vista), things can easily overwhelm you.
So, more for me than for you :-) I’ve collected a set of links from remyservices.wordpress.com (David Remy) going through much of the necessary details using SysPrep. Thanks, David!

And in case in the future something more will be added to the series, check out the google link.

Happy administering!


Why 32768 isn’t always the same as 0×8000

Contrary to intuition, the C constants ’32768′ and ’0×8000′ have an
identical representation (0×8000), but possibly different types in C.
If you consider a processor with a 16-bit int type, and a 32-bit long
type, 32768 is considered long, whereas 0×8000 (and the octal variant
0100000) is considered ‘unsigned int’.

If you feel the need, check the C standard at
http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf, page 55 at the
bottom, and the table at page 56. (look at Harbison & Steele, 5th edition, and look at section 2.7.1 page 24ff).

Normally, there is no problem when using such values, since the
representations are identical.
However, consider this small example:

#define C_DECIMAL     32768
#define C_HEXADECIMAL 0x8000

void main(int argc, char *argv[])
{
  volatile long long_dec = ((long)~C_DECIMAL);
  volatile long long_hex = ((long)~C_HEXADECIMAL);

  return;
}

When C_DECIMAL is considered long, the negation will invert 32 bits,
resulting in a representation 0xFFFF7FFF with type ‘long’; the cast is
superfluous.
When C_HEXADECIMAL is considered ‘unsigned int’, the negation will invert
16 bits, resulting in a representation 0x7FFF with type ‘unsigned int’;
the cast will then zero-extend to a ‘long’ value of 0x00007FFF.

Checking with a 16-bit integer compiler (CW7.1 ColdFire using ‘-intsize 2′):

0x00000000                    _main:
;                             main:
0x00000000  0x4E560000               link     a6,#0
0x00000004  0x518F                   subq.l   #8,a7
0x00000006  0x223CFFFF7FFF           move.l   #-32769,d1
0x0000000C  0x2D41FFF8               move.l   d1,-8(a6)
0x00000010  0x223C00007FFF           move.l   #32767,d1
0x00000016  0x2D41FFFC               move.l   d1,-4(a6)
0x0000001A  0x4E5E                   unlk     a6
0x0000001C  0x4E75                   rts

For those of you who do not know how to read assembler code I have made the differing values italic. So the compiler confirms the difference in behavior, and this is not a compiler error.

Lucky you if you have Lint to warn you. (Yes, I know, other tools will too, if you let them…)

Happy coding!


Use of in-line assembler using ‘asm’ variants

Just a “small” post to expand on my most recent tweet. In-line assembler is quite a difficult topic, but unavoidable in most embedded environments. And the syntactic variants are more numerous than the bugs in your code.

For that reason I give a piece of advice: For each different assembler semantic, use a different macro. For an assembler function, use ASM_FN like this:
ASM_FN int MyAssemblerFunction(void)
{
...assembler instructions...
}
For an in-line assembler block, use ASM_BLOCK like this:
...
a += 4;
ASM_BLOCK volatile {
...assembler instructions...
};
And for single-line in-line assembler instructions use ASM_LINE like this:
...
a += 4;
ASM_LINE movb ax,0b00000010;
...
Now, all these uses will need to expand into the non-standard keyword asm for the compiler to process everything correctly. Many compilers accept different forms of the keyword, so you may use asm, __asm and __asm__ interchangeably. If you want to, you can take these three variants instead of the ASM_(FN|BLOCK|LINE) as I suggested.

The idea is to enable Lint to expand each of these three forms differently: The function containing only assembler instructions shall be ignored by Lint, but its prototype needs to be known. Therefore we need to enable the Lint keyword _ignore_init (the body of the function is seen as a form of “initialization”), and provide the options:
+rw(_ignore_init)
+dASM_FN=_ignore_init
The plus-sign in the second option prevents the definition in our code (to asm or one its variants) to override the Lint-specific definition. However, for ASM_BLOCK this replacement will not work, so we need a different replacement:
+rw(_up_to_brackets)
+dASM_BLOCK=_up_to_brackets
And the third form again needs a different replacement, since in this case, no brackets need be present at all:
+rw(_to_semi)
+dASM_LINE=_to_semi
With some other form of in-line assembler definition you might even need _to_eol, or one of the other gobblers. But make sure to use different macros for different syntactic usages of asm, so you have the chance to use different gobblers for all situations.

Happy Linting!


Twitterfeed on PC Lint

I’m going to experiment a little with Twitter. I created a Twitter account called LintTweets, where I intend to collect some Lint wisdom in 140 characters. In some cases I may expand on those quite limited 140 characters in my blog here. You’re welcome to read here and/or follow me on Twitter. The frequency of my tweets there will probably not exceed once a week, so don’t expect too much.

Happy Linting!


Two new articles: Physical Architecture and Coding Conventions

I just added two articles, again, the basics of which have been written some time ago, but now I revamped them and put them on the website.
And since the number of articles (or: white papers) has grown, I have dedicated a new page to these “white papers”.

I now express my opinion on physical architecture (in contrast to the logical software architecture everyone learns when trying to learn programming), and on the almost religious topic of coding conventions. Have a look at the new white paper page!

Johan


New version of “How To Wield PC-Lint”

I finally got to adapt my document for the most recent Lint version 9.0b. I have replaced the original document, but if you just want to see where the changes have been incorporated, a version with change bars is available.

Happy linting!