Archive for March, 2009


Why 32768 isn’t always the same as 0×8000

Contrary to intuition, the C constants ‘32768′ and ‘0×8000′ have an
identical representation (0×8000), but possibly different types in C.
If you consider a processor with a 16-bit int type, and a 32-bit long
type, 32768 is considered long, whereas 0×8000 (and the octal variant
0100000) is considered ‘unsigned int’.

If you feel the need, check the C standard at
http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf, page 55 at the
bottom, and the table at page 56. (look at Harbison & Steele, 5th edition, and look at section 2.7.1 page 24ff).

Normally, there is no problem when using such values, since the
representations are identical.
However, consider this small example:

#define C_DECIMAL     32768
#define C_HEXADECIMAL 0x8000

void main(int argc, char *argv[])
{
  volatile long long_dec = ((long)~C_DECIMAL);
  volatile long long_hex = ((long)~C_HEXADECIMAL);

  return;
}

When C_DECIMAL is considered long, the negation will invert 32 bits,
resulting in a representation 0xFFFF7FFF with type ‘long’; the cast is
superfluous.
When C_HEXADECIMAL is considered ‘unsigned int’, the negation will invert
16 bits, resulting in a representation 0×7FFF with type ‘unsigned int’;
the cast will then zero-extend to a ‘long’ value of 0×00007FFF.

Checking with a 16-bit integer compiler (CW7.1 ColdFire using ‘-intsize 2′):

0x00000000                    _main:
;                             main:
0x00000000  0x4E560000               link     a6,#0
0x00000004  0x518F                   subq.l   #8,a7
0x00000006  0x223CFFFF7FFF           move.l   #-32769,d1
0x0000000C  0x2D41FFF8               move.l   d1,-8(a6)
0x00000010  0x223C00007FFF           move.l   #32767,d1
0x00000016  0x2D41FFFC               move.l   d1,-4(a6)
0x0000001A  0x4E5E                   unlk     a6
0x0000001C  0x4E75                   rts

For those of you who do not know how to read assembler code I have made the differing values italic. So the compiler confirms the difference in behavior, and this is not a compiler error.

Lucky you if you have Lint to warn you. (Yes, I know, other tools will too, if you let them…)

Happy coding!


Use of in-line assembler using ‘asm’ variants

Just a “small” post to expand on my most recent tweet. In-line assembler is quite a difficult topic, but unavoidable in most embedded environments. And the syntactic variants are more numerous than the bugs in your code.

For that reason I give a piece of advice: For each different assembler semantic, use a different macro. For an assembler function, use ASM_FN like this:
ASM_FN int MyAssemblerFunction(void)
{
...assembler instructions...
}
For an in-line assembler block, use ASM_BLOCK like this:
...
a += 4;
ASM_BLOCK volatile {
...assembler instructions...
};
And for single-line in-line assembler instructions use ASM_LINE like this:
...
a += 4;
ASM_LINE movb ax,0b00000010;
...
Now, all these uses will need to expand into the non-standard keyword asm for the compiler to process everything correctly. Many compilers accept different forms of the keyword, so you may use asm, __asm and __asm__ interchangeably. If you want to, you can take these three variants instead of the ASM_(FN|BLOCK|LINE) as I suggested.

The idea is to enable Lint to expand each of these three forms differently: The function containing only assembler instructions shall be ignored by Lint, but its prototype needs to be known. Therefore we need to enable the Lint keyword _ignore_init (the body of the function is seen as a form of “initialization”), and provide the options:
+rw(_ignore_init)
+dASM_FN=_ignore_init
The plus-sign in the second option prevents the definition in our code (to asm or one its variants) to override the Lint-specific definition. However, for ASM_BLOCK this replacement will not work, so we need a different replacement:
+rw(_up_to_brackets)
+dASM_BLOCK=_up_to_brackets
And the third form again needs a different replacement, since in this case, no brackets need be present at all:
+rw(_to_semi)
+dASM_LINE=_to_semi
With some other form of in-line assembler definition you might even need _to_eol, or one of the other gobblers. But make sure to use different macros for different syntactic usages of asm, so you have the chance to use different gobblers for all situations.

Happy Linting!