Programming for Tomorrow
Writing Maintainable Code

Written by
Steven P Valliere

of

Electronic Visions, Inc.
Rockledge, Florida, USA
June 15, 2004


Forward

Just about twenty years ago, I arrived at my first (and only) assignment in the USAF. That job also marked my introduction to maintaining code written by others. Although the USAF (and some college) had taught me the basics of FORTRAN, COBOL, BASIC, PL/I and Assembler, the first thing I needed to do was to learn a new language: C. I tore through the first edition of Kernighan & Richie's famous book, Programming in C in about a month and was promptly dumped into my first maintenance project. That was my primary job for 3½ years, with a few original programs and quite a bit of ‘regular' USAF jobs (like base trash detail, etc.) thrown in for variety.

While I don't remember the specifics of any of the programs I had to maintain, I do remember that the first thing I did was, almost invariably, reformat all of the source code into “my style” (i.e. the style I learned first). Then I suffered through strange variables; language constructs used apparently only because they seemed “neat” at the time; missing comments or worse, comments from a couple who were having an affair while the code was originally written and left love notes throughout the system code; and far too many attempts at minimizing the amount of work the compiler had to do, strangely labeled as “more efficient” code (too often, the assembler output would dispute that assertion).

This document is my humble attempt to offer some guidelines that may help the people who come after you. Then again, if you keep your job long enough (or move in relatively small circles) you just might be helping your future self.


Chapter 1 – Overview

Many of the things I say in this text may be clichés, but keep in mind what a movie character I saw the other night said, “Have you ever noticed that all clichés are true?” I'm going to be presenting a set of guidelines that will hopefully help lead you to writing more maintainable code. As with most guidelines, one of the tricks that is most difficult to teach is knowing when to bend them, alter them or even when to ignore them completely. I'm going to use the C language partly because it is very widely used and partly because some C programmers seem to delight in using the most obtuse and difficult to maintain styles I've ever seen – sometimes even on par with uncommented, machine generated assembler.

I'm going to cover the following topics and then attempt to combine them into a meaningful whole for my conclusion:

As I progress through the topics, I'll present both good and bad examples to demonstrate my points. As you progress though all of the information here, remember that while I'm a strong advocate of everything here, there are still times when it makes sense to violate one (or more) of the ideas presented here. Knowing when to violate the rules is one of the things the must, unfortunately, be learned by experience. Even though some examples may be provided, they cannot take the place of the experiences gained by actually writing and (more importantly) maintaining code.


Chapter 2 – Maintainable Code

Where do we draw the line between maintainable and unmaintainable code? In fact, can such a line be drawn clearly enough to see? Unfortunately, the answer is most likely, “No.” The difficulty lies in the fact the every programmer has different levels of training, experience, skill sets and ways of thinking. The complexity of some programming problems can cause programmers to reach into every part of their personality, experience and skill set in order to solve a problem in the best way they can imagine. Despite all of that, I am still going to attempt to describe some basic concepts of what is and from time to time, what is not maintainable code.

The second most important attribute of maintainable code is that programmers looking at it for the first time may readily understand it. The most important attribute is, of course, functionality: if the code doesn't work, it won't be used and probably won't be maintained. But… There have been times in my career when I've been given a program and told to make an old program do new function ‘ X '. After I started, I was to discover that the program never worked before, but the customer wants me to add a new feature so they will be able to use it. The new feature turned out to be making the program work for the first time!

By necessity, we will assume that the maintenance programmers' experience level is adequate to the task at hand. It is unrealistic to expect a novice to jump directly into a project with millions of lines of source, but if all of that code was written to be maintainable, it could provide a very good education to the novice!

Most of the following chapters cover different aspects of keeping source code clear, readable and comprehensible.

Why write unmaintainable code?

There are many answers to that question. The least defensible (in my opinion) are:

K&R Far too many programmers think that the coding style used through much of Kernighan & Richie's Programming in C book is the “one true way” to layout their code. The book may have been updated since the volume I have, but it used a number of different styles throughout the book, hopefully to demonstrate that the language itself didn't depend on any particular style or layout. I say hopefully because the other alternatives are that the coders who wrote the examples for the book were either sloppy or worse, intentionally varying the style to confuse readers.
Job Security Often times this will be done by contractors hoping to extend their contract with maintenance and support work. If they are the only ones who can comprehend the code, who is in a better position to get the job? Other times, programmers at large companies see layoffs in other areas and get nervous, so their code gets harder and harder for anyone else to understand. Then, if they come up for a layoff review, the fact that they are the only ones who can work with the code they've written often assures their job will continue.
Laziness Sadly, some programmers just don't bother to try to write their code clearly. They're in a hurry, and often coding on the fly with no time to keep things organized. And if they had the time available, they would waste it on something like locating the perfect shades of blue for their gradient background instead of cleaning up their code. Many lazy programmers offer K&R as an excuse, because their inattention to good style guidelines (and consistency) tends to make their code resemble the examples in the K&R book.
Novelty Some people feel a need to use every feature of their tools, whether they are needed or not. Imagine a child with a Swiss Army Knife at the dinner table: at some point during the meal, every single one of the knife's blades, tools, attachments will be used (despite the fact that the child probably didn't have any real need for the corkscrew or divot repair tool at dinner).
Inexperience People who have never worked as maintenance programmers often don't seem to be able to grasp the importance of clear, understandable code. All of the code they've ever seen was perfectly clear to them (of course, they wrote all of it).
Indifference
(a.k.a. Elitism)
Rather an obvious one, but all too common. Although I'll probably be lynched for saying this, many open source projects written in C are guilty of this one. I say that because they are often stuck with difficult to read formatting rules where the programmer's intent is not always obvious, resulting in code that is relatively easy to break [by accident] when trying to fix a bug/add a feature/make an adjustment.

I consider elitism to be a form of indifference, too. When a programmer writes convoluted, overly complex and difficult to read/understand code because he (or she) expects others to be intimately familiar with his style, the programmer is really being indifferent to the reality of his work. Another reason for indifference through elitism is the quality, type and number of tools available to the developer. Those who have access to the best tools, often forget that others may not have those tools (or may not have the experience necessary to use the tools ) and then write “ugly” code because the tools make it all seem clear and understandable to them.

Why Write Maintainable Code?

Aside from the most obvious answer, writing code with maintainability in mind from the outset has a tendency to result in more robust software (in both reliability and security) as well as generally more correct code. The simple act of paying more attention to how the code looks, how clearly the steps, tests and calculations are presented, how appropriate the variable and constant names are, etc. will generally lead to a much better program overall.

The bottom line is that, except for very rare instances, the lifetime cost of code written to be maintainable from the start will always be less than the cost of code written with little of no thought to maintainability. That is because clear, well-constructed source code is always easier to understand than “the other stuff.” When reliability and security of the running program is factored into the lifetime cost, the benefits gained with maintainable source increase, not only because the program will generally be more solid, but also because when the inevitable flaw is uncovered, finding and fixing it will be far, far easier.


Chapter 3 – Development Tools

At the time I write this, programmers have such a large choice of development tools that it boggles the mind. Beyond the individual tools lies an ever-growing collection of integrated development environments (or IDEs). Often an IDE will be a single program (or collection of programs) from a single source all united under a single interface allowing access to the various editors, compilers, debuggers, and other tools used for development these days. Since the purpose of this text is writing maintainable source code, we're only really interested in the source editor, whether as part of an IDE, as a stand alone utility, or even as an integrator of other tools itself.

First and foremost let me state my belief that one's favorite source editor is often more like a religious conviction than anything else. Some programmers are still quite happy using editors that have none (or virtually none) of the features I'll be discussing. Whether your editor supports a feature or not, the person who must maintain your code will almost certainly use a different editor and one shouldn't penalize him (or her) by avoiding things that an editor may use to make maintenance easier.

Second, I am absolutely positive that I will leave out many features from many editors that their advocates use all the time to make their lives easier. My purpose is to convince you to pay attention to the possibilities when creating your code. To that end I'll be describing these features:

I will not be covering any macro facilities because they vary greatly among editors.

Automatic Formatting

Using a standard format for your code is a very important first step in making it maintainable. Many editors can help by using (generally adjustable) formatting rules to both format existing code and to format new code as it is entered. Some are even capable of “learning” the format of some code and applying it to the remainder.

It is important to pay attention to the way your code is structured so that the formatting rules are applicable by automated tools. If you notice a section of your code that the automatic formatting cannot get right, you should look very closely at your reasons for that confusing piece of code, lest a future programmer curse your name for writing it.

Tabs vs. Spaces

There's no easy answer to this one, there are some very good reasons to use tabs (although saving disk space is not often one of them any longer). Tabs have the somewhat magical ability to vary in width, unlike any other character. That can allow people to see code indented as much as they prefer (some people like 3 or 4 spaces/tab, some like 6 or 8).

There are two main problems with tabs:

  1. Many programmers forget to mention the original tab size in the file header comments. That simple addition to your file headers can make a world of difference to others trying to read your code on their own systems.
  2. Embedded tabs (those that are not at the beginning of a line) will almost never work (i.e. align the text that follows them) correctly at any setting except the original. So you might find, for example, that at your preferred tab setting, the data types for a set of variables align perfectly, but the variable names are all over the place because the tabs between the type and the name didn't expand as expected.

There is another tabs related issue, too: some editors automatically convert tabs to spaces, thus freezing the indentation of the code to the setting they were using.

I recommend only using tabs at the beginning of a line. Embedding tabs within a line of code will inevitably make the code harder to read at some time in the future.

If you choose to use spaces only (as I do) then you will be in complete control of your code (barring a code formatting tool like unix's indent utility). The main drawback is that (depending on your editor's capabilities) it may be a bit more work to change the indentation level of a block of code that was moved or copied that it would be with tabs.

Syntax Coloring

This feature is very useful when trying to understand existing code because it very clearly marks all of the different parts of the code: keywords, statements, functions, variables, numbers, strings, comments, etc. There are times when clever coding can defeat the syntax coloring function however, because the syntax coloring function is written to be very fast and generally correct instead of perfect and probably slow. If you notice that you've managed to write some code that the syntax coloring gets wrong, it is a good bet that the person maintaining that piece of code in five or ten years will be confused by it, too.

Variable Expansion/Lookup

Editors are now often capable of automatically building a database of all of the data objects visible in the scope where you are editing. When you enter the dot (or pointer) to connect a structure name with a member, you may be presented with a list of member variables. Sometimes, the list will even show the comment immediately following the member declaration. These lists are your friends. They help minimize the confusion and profusion of global variables when one simple ‘trick' is used: Collect all global variables into a single struct that is visible to all. Then, any time a global is needed, type the name of the struct , then a dot and, viola! a list of all global variables that currently exist is presented.

The trick fails at times when all of the modules have a private global struct with the same name because the editors' browse databases are not nearly as sophisticated as the compiler's. If you are presented with the wrong list of members for a struct , then you probably have two different struct s with the same name. While the compiler may keep them separate, the linker may try to connect them and your maintenance programmer almost certainly will.

Function Parameter Expansion

Just like the struct member database, editors often maintain a function database containing the return values and parameters of all of the functions visible in the current scope. Also like the struct members, if the function declaration has one parameter per line, followed by a comment, the comments will often be visible when the parameter list is displayed.

Symbol Browser

Finally, all of the program's symbols are kept in a database, normally called a symbol browser (or something similar). This database will almost certainly allow you to look up function declarations and will usually allow you to see a list of locations where a function is referenced as well. Some of these are capable of displaying call (and called by) trees, too.

If you're wondering why this is here, it is for the maintenance programmers who might be reading. They are trying to figure out some strange legacy code that they inherited and I thought I'd point this out in case they missed it.

Another reason for mentioning symbol browsers is that some programmer's use them as an excuse to write difficult to read code because, “you can use the browser to find what you need.” That is the kind of logic I associate with people who wrote subroutines to add a number of days to a date and return the new date that work like this (in pseudo-code):

New = AddDays( “June 5, 2004”, 30 days ); 

Then, when we print the new date, this comes out:

June 29, 2004 

Although this sounds like a stupid, minor mistake, how would you like your creditors to calculate your payment due dates like this or accrue your interest this way? (In fact, this specific example was something I experienced just a few days ago when renewing an anti-virus subscription: I was told I had 30 days left on June 5, 2004 and that my subscription expired on June 29, 2004! And that was from one of the biggest AV software companies! But I digress… ?)


Chapter 4 – Code Layout and Formatting

The most obvious parts of programming style, these are very important parts of creating maintainable code. I'm sure to have some religious objectors here, but in my defense, the observations I'll be presenting have been collected over the course of 20 years of programming (both original and maintenance) and talking with the people who have inherited my code, sometimes years after I wrote it. Without exception, I've been told that the style I've used (and, I think, improved) over the years has been very maintainable, even by relative novices. That is the layout and formatting style that I will be describing here.

The C language, like many other languages, is extremely tolerant of the source code format. As long as there is white space where it belongs, and the individual lines are not longer than the compiler can read, you can do what you want. My favorite example is an entry from the Obfusticated C contest that played Tic-Tac-Toe using C source code in the shape of a Tic-Tac-Toe board.

However, just because you can do a thing, there is still the question of should you do that thing? I believe the answer is no. If you really want to write code that others can understand and maintain, then you need a very clear and consistent layout and format for your code.

For a simple example of how braces can help, take a quick look at the following two blocks of code. When your deadline is rapidly approaching and this is someone else's source, they may seem to do exactly the same thing, and although both should compile fine, they will produce very different results:

void
afunction( int p1, char *p2)
{
   int z;
   for (z=0;z<p1;z++)
      if (*p2==0)
         break;
      *p2=toupper(*p2);
      p2++;
   return;
}
void afunction( int p1, char *p2 )
{
   int z;
   for( z=0; z<p1; z++ )
   {
      if( 0 == *p2 )
      {
         break;
      }
      *p2 = toupper(*p2);
      p2++;
   }
   return;
} 

The code on the left is representative of code that was originally written poorly (but only a little poorly), and then modified (mods shown in red) in an [apparently untested] attempt to fix a problem. However, poor formatting rules and an inattentive maintainer led to more bugs instead of a fix.

The code on the right shows how the same code may have appeared if stronger formatting rules had been used. You can see that adding code (in red) to the loop involved new lines between the braces, producing the desired effect.

Note the placement of the braces in the code on the right. By definition, the braces enclose a block of code that, for all intents and purposes is a single statement at the level of the surrounding code. By placing the open/close in line (vertically) with each other and at the level of the surrounding code, we have a very clear marker of our block. Should a block become extremely long (not a generally good practice, but sometimes unavoidable) one could use a long ruler to connect distant braces on a printed listing to help visualize the code's structure.

Placement of braces and other indentation styles are as religious a topic as which source editor is best. I'm not interested in arguing, only in presenting my reasons for the style I'm recommending and reminding the reader that this is the only brace format of which I've not ever heard a complaint (discounting the, “it just isn't right” kind of complaint, since those don't hold water for me).

You might also notice the different layout for the function definition itself. As long as it is kept reasonably clear, I've no objections to either format. My preference is generally for the format on the right side, but there are times when I will put one parameter per line so that I may comment their purpose (and often in that case, the way that are changed, if at all).

General Formatting Rules

switch( cond1 )
{
case 1 :
   do_something();
   // Now fall into case 2 and do that, too.
case 2 :
   do_something_else();
   break;
default :
   break;
} 

Also notice that the default condition is present, even though it does nothing. It is good practice to always have a default condition to show your intent. Because C allows the flow of execution to continue from one case into the next, it is also important to either have a break statement to indicate that you wish to leave the switch entirely, or to comment that you intend to flow from one case to the next.

Lately, I've also become fond of adding an extra blank line before each of the case lines (except the first, because it has a natural space on the brace line). This helps make each case line (and the default ) stand out just a little more, making the code that much more readable.

if( condition )
{
}
else
{
   if( another_condition )
   {
   }
   else
   {
   }
} 

When a sequence of if() / else if() / else if() / else has more of the sense of a switch (but with variable tests instead of the constants the case requires), then everything should align vertically in the same column, similar to a switch :

if( cond1 )
{
}
else if( cond2 )
{
}
else if( cond3 )
{
}
else
{
} 

Notice that braces are always used, even when a condition has only one line of code to execute (perhaps even no lines if you are leaving “hooks” for future features, but then a comment is generally required). This eliminates any possibility of misunderstanding between the original programmer and the maintainer. If also makes it far easier to copy new lines of code into a particular condition without requiring one to remember to add braces after the fact.

Examples:

xpos = (ypos * something) 
     + (zpos / sumthinelse) 
     - 17; 
if( ((xpos <= 0) || (80 < xpos)) || 
    ((ypos <= 0) || (25 < ypos)) ) 
x = SomeFunction( dwVeryLongVariable, 
                  dwAnotherLongName, 
                  dwMoreOfSame ); 
rc = do_this_function( onvar1, onvar2, &onvr3 ); 

is easier to read than

rc = do_this_function(onvar1,onvar2,&onvr3); 

When you have syntax coloring available (or even very clear (and/or large) printing) you may be tempted to dispute this. The first time you see a monocolor listing on a display or printout with small, difficult to read characters, this one will make more sense.

This is also true around most other operators as well, except for the unary operators. The unary operators are normally “snuggled” right up next to the variable to which they are applied. This helps prevent them from being mistaken as binary operators.

if( condition ) 
if (condition) 
if ( condition ) 
if(condition) 
if( CONST_SYMBOL == myVar ) 
if( myVar < testVar ) 
if( (myVar1 < testVar) && (testVar < myVar2) )
if( ((myVar1 < testVar) && (testVar < myVar2)) || (other1 == other2)) 

In the first example, we have two tests that together check to see if testVar is between, but not equal to myVar1 and myVar2 . The second test adds a third condition that by itself may cause the test to result in the TRUE branch being followed. Note that additional parenthesis were added to show that we consider the first two conditions together at a similar level to the third condition. These particular parenthesis are unnecessary for the compiler, since in this instance the standard operator precedence rules should be adequate. However, the extra set of parens does help to show the coder's mental concept that the first two tests are somehow related, possibly providing a critical hint to a future programmer.


Potential Criticisms

I am fully aware that not everyone will be fond of this code format and layout. Generally, that will be because people are more used to seeing some other layout rather than any concrete, well thought out objections. That said, there is at least one issue that should be addressed: vertical white space (blank, or nearly blank) lines.

The formatting guidelines presented here tend to stretch source code vertically, causing printouts to use more paper and fewer lines of executable code will be visible in your editor's display window. Well, there's no denying that, but remember that the extra space helps segregate things within a program. Generally, breaking things down into smaller, more easily understood chunks is a good thing. In fact, I've been on projects where no subroutine could have more than 100 lines of code (a silly artificial restriction if ever there was one).

The thing is, if you find yourself jumping more than a screen or so back and forth between two closely relating lines of code in the same routine, you should be asking yourself, “Why aren't these lines closer together? What kind of mistake might I have made that caused me to put these related things to far apart? Is there anything I can do to keep things working yet have the lines closer together?”


Chapter 5 – Variables

It almost goes without saying that variables play a significant part in maintainable code. In general, huge data structures that contain all sorts of unrelated data are a bad idea, but sometimes there will be other reasons that override that sort of common sense.

Naming

The most important feature of variables when first looking at a new piece of code is their names. As trite and cliché as this is, I'll say it anyway: Meaningful names are very important. At the same time, meaningless (or nearly so) names have their place, too. Every important variable, like parameters, structure members, and anything related to the functions being performed by the program should have a meaningful name.

On the other hand, it is often preferable to pick a single letter (or two) to use [almost] exclusively as loop indexes and similar things. In fact, when used consistently, a few single letter variable names can be as meaningful as their much longer bretheren. For example, imaging a program that reads real world data from a set of scales. The scales are known by the customer as ‘ position 1 ', ‘ position 2 ', etc. Everything about how the scales are handled is the same, so you declare a data structure to contain all of the information necessary for reading the scale, computing the necessary values, and displaying them somewhere. Then, everywhere you access the array of scale structures, you use a variable named ‘ p ' (for position). That will work as long as you are careful to never use ‘ p ' for any other purpose (except, maybe in a place having nothing to do with the scales). The short index will help make the structure member names more prominent, and after all, that is what you're actually working with, so that's what the maintenance program will need to see most.

Good, clear naming rules (and strong adherence to them) will help lead to a quick understanding of the code. During the first part of the 1990's, when Windows was first exploding out into the market, many programmers were first exposed to an idea from Mr. Charles Simonyi, of Microsoft, that was nicknamed “Hungarian Notation.” You may read the full description on Microsoft's web site at:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnvsgen/html/HungaNotat.asp

If the above URL doesn't work any more (after all, things get moved around from time to time), you might want to search for “hungarian notation” at www.microsoft.com and check out the articles you find there.

Hungarian Notation

Mr. Simonyi's idea was to come up with a set of short prefixes to indicate the data type of each variable. Most of the Windows headers and documentation used this notation, so Windows programmers allowed it to creep into their code, too, to keep things looking relatively consistent. Some examples:

Machine Level Type Windows Type Prefix Example
Unsigned 16 bit integer WORD w wLength
Unsigned 32 bit integer DWORD dw dwSize
Signed 16 bit integer short s sVar
n/a RECT r rClient
n/a POINT pt ptMouse
Signed 64 bit integer LONGLONG ll llFileTime

Hungarian Notation – Good Things

Hungarian Notation – Bad Things

Data Types

Creating your own data types can be very helpful in creating portable code. If you generate them such that it is very difficult to mix them with other types (and Microsoft tried to do with many of the types in the Windows headers) will help ensure that you don't get caught in a processor-specific trap.

Depending on the type of application, an abundance of custom data types may aid or hinder the maintenance effort. In machine communications, trying to abstract the data being transferred too far usually makes it much more difficult to troubleshoot and maintain. In an end-user application the same level of abstraction can make the program easier to understand/maintain as well as easier to port to other architectures.

One thing that is often helpful is to group related data into structures. The structure names in the source code (plus the structure declaration) help the maintenance programmer quickly see what things are related. Nesting structures so deep that you are tempted to make macros to access the deepest members may indicate a design problem. And try to avoid those macros unless you want to be stalked by the programmers who must maintain your code…

Global vs Local

Global variables are often disparaged as dangerous, unnecessary and in general, problems waiting to happen. But, as with most things in programming or for that matter, any sort of tools, the veracity of that statement is inversely proportionate to the experience of the user and the care used when working with the tool.

It is important, however, to have a naming convention that is special and restricted to global variables. A simple prefix of ‘ g_ ' will do, but feel free to choose your own, just be very strict in application of the rule. It is important to ensure that all global variables comply and the no local variables comply with the global naming rule. That helps avoiding accidental conflicts.

Because of one of the features in some newer source editors, we've taken to collecting all of the global variables in a single struct and naming it something beginning with a small ‘ g '. In some editors, this causes the editor to display a list of structure members when the ‘.' is entered, allowing us to pick and choose which global we need. And because all of the global variables are members of the same struct , we also have a naming rule that is easy to follow even when the fancy editor features aren't available. (And one extra benefit is that we can save/restore all of our global variables with a single write/read using the address and size of the global struct , should the need arise.)

Tramp Variables

Tramp is a name for a variable that is passed from one subroutine to another without being otherwise used by those routines. Tramps are often a result from obsession about eliminating global variables. They occur when a variable at one level is needed only at a much lower level, but not at the levels in between.

For example, in a Windows program, the top-level routines have the window handle and pass it to the second level routines. The second level performs some form of data collection and passes the data and window handle to the third level. The third level determines the necessary fonts, colors, screen positions, etc. and uses the window handle to access a drawing surface, where it displays some information.

In the example, the tramp only visits one routine that doesn't need it. In the real world, some tramp variables may visit many, many more routines, and pushing/popping all of those tramps on and off the stack definitely doesn't add to a program's efficiency.

One way to hide tramps is to collect them into a single data structure and pass its address all over the place. Then lower lever routines can access the parts they need.

But…

If you think carefully about that solution, you will realize that the data structure being used is itself a form of a global data block, because every single routine that gets the pointer has access to the entire block, whether it needs it or not.

When you get to this point, it is time to decide if there is every any chance that your program might be required to operate on more than one of the objects described by the “tramp” data structure. If that is the case, then the structure would no longer be a tramp as described above, instead it would be everything unique about the object it describes.

However, if you are working on an embedded application that will never have more than one object and performance is important, then moving the data block from the stack to global variable space might be the way to go.

Most importantly, you should always try to think along the line that global variables are a privilege, not a right. If you misuse them you will almost certainly get burned by software bugs, or maintenance programmers (in effigy, one hopes) who are trying to figure out why one of your global variables is changing unexpectedly.


Chapter 6 – Comments

Comments are an often mentioned but rarely done (well, at least) component of maintainable code. In general, every source file should have a header comment block describing the purpose of the file. Often company policies will dictate additional information such as author name, date, company name, address, copyright information, etc. As mentioned above, it is also a good idea to include the original tab expansion value and any other variable formatting rules that may be applicable so that others may view your source as intended.

I, personally, like to include a comment showing the top (and one at the end) of the file. This is a holdover from the days when we shared code over modems and it was helpful to have visible evidence that you have a complete file (or not).

Another personal preference is to put a separator comment line between functions and also between different sections of the header files (like #include s, #define s, typedef s, function prototypes, etc.). Once again, these don't serve any purpose other than making each function stand out from one another. Of course, when present those separator lines are a great place to start a function header comment that describes what the function does, what it takes for inputs, what it returns and what, if any, side effects it may have.

The following subroutine comes from a real world application and is included to provide an example of good comments. Notice that the comments assume a level of familiarity with the function and purpose of the application. Trying to document every function, variable and statement for a complete novice would make the code (and more importantly, the comments ) nearly impossible to maintain.

void acp_MakeEstimates( HWND hwnd, enum Position_e p )
{
   int        s  = g.pos[p].graph.nSamples - 1; // Index of most recent sample
   int        s0 = s - g.nTimeSamples;          // Index of earlier time sample
   SYSTEMTIME st;
   double     T0; // Time of earliest sample to use
   double     T1; // Time of current sample
   double     Tf; // Estimated completion time
   double     W0; // Weight at earliest sample to use
   double     W1; // Weight at current sample
   double     Wf; // Full Weight
   double     A1; // Assay used for current sample (minus bias)
   double     At; // Assay target
   double     V1; // aVerage assay at current sample
   
   W1 = g.pos[p].graph.s[s].rdWeight;
   Wf = g.pos[p].rdFullWeight;
   A1 = g.pos[p].graph.s[s].rdAssay[g.pos[p].graph.s[s].iAssay]
      + g.pos[p].graph.s[s].rdBias;
   At = g.pos[p].rdTargetAssay;
   V1 = g.pos[p].graph.s[s].rdAssayA;
   
   if( g.nTimeSamples < s )
   {
      // Estimate completion time
      // (T1 - T0) * (Wf - W0)
      // --------------------- + T0 = Tf
      //       (W1 - W0)
      T0 = g.pos[p].graph.s[s0].rdTime;
      GetLocalTime( &st );
      T1 = SystemTimeToDouble( &st );
      W0 = g.pos[p].graph.s[s0].rdWeight;
      Tf = (((T1 - T0) * (Wf - W0)) / (W1 - W0)) + T0;
      DoubleToSystemTime( Tf, &g.pos[p].stFullAt );
      u_SetDlgItemSTime( hwnd, fid[TX_FULLAT][p], &g.pos[p].stFullAt );
   }
   else
   {
      SetDlgItemText( hwnd, fid[TX_FULLAT][p], "??:??" );
   }
   
   if( START_SAMPLE < s )
   {
      // Estimate assay at full
      // ((Wf - W1) * A1) + (V1 * W1)
      // ---------------------------- = Vf
      //              Wf
      g.pos[p].rdEstAAF = (((Wf - W1) * A1) + (V1 * W1)) / Wf;
      u_SetDlgItemDouble( hwnd, fid[TX_EASYF][p], g.szAssayFormat, g.pos[p].rdEstAAF );
      // Estimate assay necessary to reach target
      // (At * Wf) - (V1 * W1)
      // --------------------- = Ar
      //       (Wf - W1)
      g.pos[p].rdToReach = ((At * Wf) - (V1 * W1)) / (Wf - W1);
      u_SetDlgItemDouble( hwnd, fid[TX_TARGET][p], g.szAssayFormat, g.pos[p].rdToReach );
   }
   else
   {
      SetDlgItemText( hwnd, fid[TX_EASYF ][p], "?.????" );
      SetDlgItemText( hwnd, fid[TX_TARGET][p], "?.????" );
   }
   return;
} 

One of the first things some programmers are sure to point out is that the routine is somewhat inefficient because it copies all of the working values into temporary variables that are used in the computation. This was done as a way of having the code document itself. By using temporary variable names that match the names used in the comments (and describing what they contain where they are declared), the initial block of assignments serves to document exactly where all of the things used in the various equations are to be found.

Another thing to notice is that while many of the variables (or at least the structure members) have either long, meaningful names or are documented temporaries for the equations, a couple of them have single letter names which seem to violate good naming conventions. The rationale behind those very short names was that they are intended for use as indexes into arrays and very long names would make the array accesses more difficult to understand. Also, these variables exist only within this routine, so the maintenance programmer need not look too far to find their meaning.

You should also notice that every equation is fully parenthesized, even though knowing the operator precedence rules could eliminate many of the parentheses. The benefits to having all of the parentheses present is that the intent of the code is immediately obvious and any alterations to the equations that may become necessary will be easier to make.

If you noticed the funny capitalization in some of the comments, take a look at the variable they are commenting. You'll see that the letter that was capitalized is the first letter of the variable. That was a not so subtle way of showing why the temporary variables were named as they were.


Chapter 7 – Substitution Macros

Macros can be a good thing, but sometimes they can hide too much of the underlying code, making it very hard to maintain. For example (true story), there was a programmer who created what looks like an entirely new language for part of a C program. It turns out that the file containing the “new language” is actually just a C source file that is doing compile time initialization of a large, complex data structure, using a dizzyingly complex set of macros. We've had numerous people try to comprehend it over the past several years and to date, only the original programmer ever claimed to understand it, but since the thing never quite worked right all the time, we doubt his claim.

Using macros to fill in default parameters can also be useful (for languages that don't support them directly) but always be aware that the macros hide the actual functions doing the work, making it more difficult to locate them, and not documenting (at the point the macro is defined, at least) what the defaults are and why there were chosen also lowers the value of the macro.

Another issue to be acutely aware of is that there may be unexpected side effects to macros. For example, the following macro can cause some very strange behavior at times:

#define poly(x,y) ((x)*(x) + 2*(x)*(y) – (y)*(y)) 

The macro just defines a polynomial equation that may be used in many places in a program, and may be subject to some form of adjustment over the years, so making it a macro seems like a good idea. However, think about the confusion that might result from a usage like this:

pv = poly( x++, y-- ); 

Not only will pv end up with an incorrect value, but x and y will be altered more than the statement above seems to indicate. That is because the macro will replace every x with x++ and every y with y-- , causing each of those to be evaluated THREE TIMES instead of the one that was expected. This is an issue that should be taught during entry level C programming, but it is surprising how many people forget about it in practice.

Macros must be very carefully constructed so as to minimize the number of unintentional side effects they may cause. Repetitive use of a macro parameter, as shown above, is one possible problem. Use of variables that are not parameters to a macro is another maintenance issue. Variables external to a macro might have their value unexpectedly altered at an inopportune time by a new line of code added during maintenance, and nothing in the immediate vicinity of the new line will show the problem, because it is hidden in a macro. Or the macro may reference an external global xyzzy , but a maintenance programmer adds a local xyzzy which the compiler then uses instead. These are the things programmer's nightmares are made of.

It is good practice to develop a naming rule that is special for (and restricted to) macros. For example, simply deciding that all of your macros will begin with the prefix ‘ m_ ' (maybe end with the suffix ‘ _m ' instead?) could save hours, days or even weeks of chin scratching, teeth gnashing and hair pulling.


Chapter 8 – Language Constructs

Strange constructs aren't always bad, but poorly thought constructs make code hard to understand, not easier. What follows are just some of the odd constructs I've used or encountered over the years.

Number 1 – Funky if/do/while

if( condition ) do
{
}while( another_condition ); 

I told myself this was the most obvious way to code a loop that had slightly different entry and exit conditions, for example when splitting a string into tokens. The first test would see if there was anything to work on, the second would see if there was anything left. In fact, in C, the strtok() function must be called in two different ways, one to start processing a string (the if test) and another to continue processing the same string (the while test). Unfortunately, this is a good example of something that made sense to its creator, but not to many of the people who followed him. Therefore, I have stopped using it in favor of clarity for my coworkers.

Number 2 – Funky if/switch

if( condition ) switch( testvar )
{
case VALUE1 :
   break;
case VALUE2 :
   break;
default :
   break;
} 

Another instance of something that seemed sensible but was not. Other programmers seemed to assume that they lacked some “special knowledge” about the inner workings of the if and switch statements when they saw them combined this way. The reality was that this example just uses strange indentation, but because it made maintenance programmers hit the books trying to locate something new (that really wasn't) about the language, this, too, has dropped out of favor for maintainable code.

Number 3 – Obsessively Avoiding goto

switch( testvar )
{
case 1 :
   xyz = abc;
   if( 0 )
   {
case 2 :
      xyz = 789;
   }
   abc = 123;
   break;
default :
   break;
} 

I found this example in a program by someone who seemed to have an irrational fear (or total lack of knowledge) of the goto statement. The odd placement of the if(0) allows case 1 and case 2 to share the same terminal code, but each assigns a different value to xyz . When the common code is only a few lines and is only referenced within this one switch, it is much clearer to use a label and a goto statement than a construct such as this. Like this:

switch( testvar )
{
case 1 :
   xyz = abc;
   goto Set_abc;
case 2 :
   xyz = 789;
   goto Set_abc;
Set_abc:
   abc = 123;
   break;
default :
   break;
} 

Notice that case 2 has a goto , even though it is only going to the next line. If the code is modified in the future, a case 3 may be added between case 2 and the label. Coding as above (a) makes your intent clearer, and (b) helps an overworked, inattentive maintenance programmer to avoid a possible mistake (perhaps without ever noticing how close he came to making one).

Number 4 – New Data Anywhere

Example 1:

switch( testvar )
{
   int local_var = 0;
case 1 :
   printf(“local_var is %d\n”,local_var);
   break;
case 2 :
   break;
default :
   break;
} 

Here is another good example of doing something that is probably not necessary just because the compiler allows it. C allows new variables to be declared at the beginning of any block. In fact, the variable might even be initialized, but the behavior is strange enough that it should be avoided.

For a test, what do you think the printf will output when testvar is 1 ? If you chose any definite value, then you are incorrect. Because the flow of execution didn't include the initialization, the value of local_var is whatever happens to be on the stack when the printf is called. That is caused by the fact that the switch effectively performs a goto directly to the appropriate case , bypassing the block initialization. Of course, some (non-ANSI C compliant) compilers may implement it differently.

Example 2:

switch( x )
{
case VALUE_0 :
   a = b + 1;
   break;
case VALUE_1 :
   {
      int t = a + b;
      t = t + 1;
      break;
case VALUE_2 :
      t = t * 2;
   }
   break;
default:
   break;
} 

Once again, our programmer may be getting a little too creating for his (or our) own good. This time, he's used a block that begins within one case handler and ends within another! One point of interest on this one, a very skilled C programmer would notice that t will be initialized to (a+b) in case VALUE_1 , but t will actually be uninitialized in case VALUE_2 because execution in that case will not pass through the initialization step.

This is exactly the kind of thing that can lead to extremely difficult to locate and diagnose problems. And the difficulties will only increase with the distance in time between initial coding and later maintenance.

There may be specialized processors and/or compilers where some of these (or other similar) constructs utilize special features of the local hardware. If (when?) that is the case, things similar to the examples above should be clearly commented and the local processor (or compiler) benefit should be explained. When maintaining a piece of code that contains something like these examples, one must be sure to keep the comments up-to-date, should the construct be altered (or removed) such that the benefit is eliminated or changed.


Chapter 9 – Efficiency and Novelty

Far too often, a programmer will confuse extremely terse source code with efficiency. C programmers are notorious for this. I think that is one of the reasons C is so often called a write only language. Given the ever increasing speed of computers and adding in the fact that a program is rarely compiled over the entire course of its useful lifetime, there is generally no excuse for trying to make the compiler's job easier.

In fact, almost every attempt to make the compile more efficient will result in code that is harder to understand – no matter how good the comments may be. A very simple example may be constructed using the language's precedence rules. While it might be “obvious” and “efficient” for the original programmer to write something like:

x=y+7*z/35.-11%x; 

It will be much clearer to the maintenance programmer who finds this instead (notice that I'm not mentioning the issue of mixing integers and floats without using casts to show what the programmer expects to happen):

x = y + (7 * (z / 35.0)) – (11 % x); 

Even though the parenthesis used seem to do nothing except confirm that we know the language precedence rules, they also serve the purpose of showing the maintenance program exactly what we think those rules are and/or what we want our equation to do.

Another popular form of “efficiency” is to perform an assignment within some other statement, often a conditional, for example:

if( a = function( x, y, z ) ) 

Modern optimizing compilers shouldn't need a hint like the sample above to test the result of an assignment unnecessary. It is better to write the example above like this:

a = function( x, y, z );
if( 0 != a ) 

Another perfectly legitimate feature of C that should be avoided unless absolutely necessary (and very well commented) is short circuit conditional evaluation which can cause part of a conditional to not be evaluated. When that part has a side effect that may or may not occur, the code should be much more obvious. For example, the function sendmsg() below sends a message to another program, but it will not even be called if the part of the condition to the left of the && evaluates to FALSE.

if( (a = function( x, y, z )) && !sendmsg(partner) ) 

C also has what it calls a trinary operator in the form condition ? trueValue : falseValue that is very, very useful when used for short simple selections. However, this construct was never intended to completely replace an if()/else (or worse, to replace a series of if()/else if()/else ) and yet, some programmers seem to think that is the case.

Worse yet, some lunatics combine other ‘interesting' features of the C language, like …


Chapter 10 – Conclusion

Sloppy, disorganized and otherwise difficult to read (and understand) source code will generally lead to lower quality software. It will almost certainly lead to the software being very difficult to repair and/or enhance in the future. It may (and often does) lead to the software being completely abandoned in the future, when the maintenance program determines (after wasting quite a bit of time trying) that it will be quicker to do a rewrite than to continue trying to determine a way to fix (and/or enhance) the existing code.

On the other hand clear, well organized and generally easy to read (and understand) source code will often lead to higher quality software. Later, when an enhancement is necessary or when the inevitable bug (or security hole) is discovered, the code will much easier to work with and the enhancement (or fix) will be ready all the sooner.