C programming Traps and Gotchas

zmetzing · June 2, 2017, 8:36pm

Except, put that curly after the conditional to save a line.

Both Linux and BSD kernel coding styles prohibit the curly braces for a single statement, something I think is a short-sighted coding standard.

Bill · June 2, 2017, 8:37pm

Disagree. I have wasted lots of time trying to match up braces when they are way out there.

zmetzing · June 2, 2017, 8:38pm

Get a better editor

wandrson · June 2, 2017, 8:38pm

https://www.gnu.org/software/indent/manual/indent.html

Change code to whatever style you prefer. I do this when I adopt code. I have found that it can even highlight bugs when the original style is changed. Such as the lack of braces in the if case above.

Bill · June 2, 2017, 8:40pm

I disagree with that when working on multi-person projects. Makes seeing what changed between revisions difficult.

Brian · June 2, 2017, 8:41pm

Following @Bill’s convention from the second post.

DanielHooper · June 2, 2017, 8:59pm

On the formatting thing, you can just run a tool that re-normalizes all the code to whatever standard you prefer when you check it in and out of the repo.

Using Atom lately I’ve been preferring if(…) { because when you collapse the braces it collapses to the relevant code instead of a hanging brace:

-> if (...) { ...

instead of

if (…)
-> { …

errant · June 3, 2017, 3:49pm

When using GCC on different platforms, don’t assume that an ambiguous order of operations will mean the same thing on different target architectures. I once ran into an issue where a statement like “int x = a+b*c” would evaluate differently on big-ended and little ended systems. Always be really specific with the syntax, even if it doesn’t look as pretty.

errant · June 3, 2017, 4:23pm

Oh, and alignment. If you take the following struct:

typedef struct {
        uint8_t a;
        uint16_t b;
        uint16_t c;
} teststr;

and do a sizeof(teststr), you might think you would get a size of 5 bytes, but what you get is 6. Memory access has to be aligned, so a padding byte is added. The padding depends on the target architecture.

DanielHooper · June 3, 2017, 5:24pm

You’re going to need to explain that one to me… because a+b*c is not ambiguous!

zmetzing · June 3, 2017, 9:42pm

The alignment itself is implementation-specific, and you should not assume that you can derive the location of a, b, or c by inspection. On an 8-bit machine, that structure may well take up 5 bytes, but on a 32-bit it might take up 8 bytes. The compiler, however, is free to make that structure into whatever it wants it to be, including 3 32-bit values, inverse of the way you declared them.

Overlaying structures on hardware, directly, is perilous, but it is done with some frequency. GCC is generally good about having a sane implementation of static memory allocation, so doing:

struct mystruct_s {
  uint32_t control_reg;
  uint32_t status_reg;
  uint16_t data_a, data_b;
}

will generally result in sizeof(mystruct_s) == 12.

If you really want to be sure, write an assembly-language function which uses the exact opcodes you desire to access hardware explicitly. Then call that function from your C code.

http://c-faq.com/struct/padding.html

Bill · June 4, 2017, 3:57am

I am interested in understanding why it did this as well. Could it be that the values were read from a file and that byte-swapping was not done on the platform where that was needed?

wandrson · June 4, 2017, 4:00am

I remember encountering this issue. It wasn’t related to the endianess, but rather an all too common bug when evaluating operator precedence. Many compilers in the 70/80’s had variations of this bug.

Bill · June 4, 2017, 4:07am

Many compilers default to aligning members of structures for the fastest access instead of the tightest packing. Some architectures can directly access unaligned data, possibly with a speed penalty, while others requires accessing a byte at a time and reassembling it in a register. There may be magic chants in the form of compiler options or pragmas to specify how structures are packed.

Bill · June 4, 2017, 4:31am

A coworker and I had a rather lively debate over this once. We were implementing a simple “four banger” calculator as a desk accessory. I insisted that multiplication and division are to be done first; he said that the cheap, commonly available calculators performed operations as they are entered as if everything was the same precedence level. To my surprise, he was right with the several units we tried. We speculated whether implementing precedence was too complicated or expensive or whether the people who would buy the simple calculator do not understand precedence and would only be confused by it. Consider the modern Internet meme in which people evaluate a complex arithmetic expression and come up with wildly different results.

Brian · June 4, 2017, 5:18am

My wife had the great displeasure of working with a compiler from MASSCOMP. When faced with a moderately complex expression it stopped outputting machine code.

errant · June 4, 2017, 12:55pm

In the case of GCC, some of the target compilers would produce one output and some would produce another. I found this in an argument with a TA I had in college. On my PowerPC laptop the code produced a different output than their x86 laptop. I took it to work and tested on PA-RISC and got the same as PowerPC. I submitted it as a bug to GCC, and the response was that the breakdown happened in the target compilation step, since that was written differently for each architecture, differences in the code produced different results. It was eventually fixed for the case I submitted, but it taught me to be careful about assuming the compiler would behave the same across architectures.

wandrson · June 4, 2017, 3:11pm

The problem is that algebraic modes require look ahead parsing to properly evaluate the expression. For instance in the example above you have to parse to the end of the expression to determine proper operator precedence.

This requires more resources and a parser designed for it. It took awhile for proper processes to enter. Calculators were the best examples of the problem. Algebraic high end calculators required the operator to use parathensis or other tricks to allow for the calculation within the stack memory of the calculator. HP utilized RPN to avoid the issue. It turned out that it proved faster for operators as well as computers to use RPN.

Bill · June 5, 2017, 3:40pm

Oh, so you ran into a compiler bug.

Bill · June 5, 2017, 4:02pm

Parse to the first operator. Remember what it is. Parse the rest of the expression. When the next operator is encountered, compare its precedence with the previous one. If the previous one is higher or the same precedence, evaluate the pending operation else remember the new operator and keep parsing. Upon reaching the end of the expression, evaluate the pending operators in reverse order.

This can be written more naturally recursively, though you will be violating the Zach’s Second Rule of Embedded Programming. It can also be done iteratively by managing the stack directly, but the code is messier.

The cost of “algebraic” entry is slightly higher than RPN. You have to stack values in addition to the identity of pending operators, plus return addresses for the recursive implementation. RPN is hard for most people to understand which is why RPN calculators are rare today and why FORTH is not a popular language despite its advantages.