Comments are not documentation

It’s no secret to my friends that love programming. Moreover, to my colleagues at work, it’s no secret that I am very passionate about it as a hobby and as something I do (semi-)professionally. Unfortunately, or perhaps not-so-unfortunately, I learned to code while I was going through my Bachelors in Engineering. While my courses, and eventually my own self-directed learning stuck with me, I never quite got the same education as somebody who was in Software Engineering or Computer Science. Nonetheless, I have worked through some hellish codebases before, and while I may not be an enterprise coder, I do think that I’ve enough experience thus far to discuss an opinion that I’ve received a lot of flak for recently. If you haven’t guessed, I’m talking about code comments.

The main issue I have with comments is mostly with regards to how they’re taught, and thus how many programmers end up treating them. If the title of this post doesn’t give away my opinion on the matter, I’ll put it succinctly: I believe comments are overused, and are often the wrong way to express intent or information to other developers. This all started when a colleague at the university mentioned something along the lines of:

All code should be approximately 50% comments if you’re doing it right. Otherwise your code will be unintelligible and unmaintainable.

Of course, remembering this off the top of my head, the conversation may not have been worded this exact way, but the sentiment is not lost in translation. I think at the time I didn’t articulate my misgivings with this attitude quite so clearly, which leads to the reason I’m writing this today. This is something that many instructors in introductory courses try to drill early on, that you should comment your code as much as possible, to make it very clear what you are intending to do at each step. In my first programming class, my own instructor left me with much the same sentiment, and for a long time this is exactly what I did when I wrote code. I had comments everywhere. However, comments are NOT documentation, and when half of your code is comments, it becomes very difficult to read, interpret (by your brain), and overall detracts from understanding of the code.

I’m positive there are plenty of pages out there similar to this one which argue over the merits of comments, either as documentation or otherwise. I’m not here to write a Code Comments Considered Harmful essay, nor am I going to say you need to use them everywhere. There’s definitely a balance, but I believe the balance is closer to “no comments” than it is to “half of your code is comments.”

Documentation

So, if I’m so against comments, what is documentation, and how should it look? First and foremost, I am not against coupling documentation and code. I am against using comments as a means to document code. Well, what does this mean? First and foremost, I hate seeing code like this:

int64_t foo(int64_t N)
{
    /*
     * foo is a static function that computes factorials.
     * It does so by means of a for loop and an accumulator
     * from 1 up to N. If N is less than or equal to 1,
     * then the function will return 1.
     *
     * N : Takes a number N as input -> must be an integer
     *
     * Returns a 64-bit integer
     */
    if (N <= 1) { return 1; }
    int64_t acc = 1;
    for (int64_t i = 1; i <= N; ++i) {
        acc *= i;
    }
    return acc;
}

First of all, the function is obviously named wrong. This was intentional in the example, but crops up more than one might expect. The first line of the comment can easily be removed if the name of the code is much more sane. Now, from then on, the ad-hoc documentation in the comment tends towards 1) being very verbose, and 2) repeating itself incessantly. This is a toy example, but these kinds of comments really do drive me batty. Why do you need to specify types in the comment? Why do you need to mention that the function uses a for loop and an accumulator? These are details that one can see plain as day if you take the time to read the 7 lines of code above. In many cases English can be nicer to read than obscure code, however, the idea that code is so foreign and unreadable that we need to duplicate our intent everywhere in order to facilitate this madness is ridiculous.

Of course, this is a contrived example, but bear in mind the way I described it above: ad-hoc, verbose, duplicate our intent, I could go on. In any case, the main problem here is that the comment is largely unnecessary, as it doesn’t really provide new information, and it actually inhibits us from reading the code beneath it. This doesn’t even begin to address what could happen if the code were to change, but the comment left in place. This is a separate issue, however, so I’ll leave it for the time being.

Where next?

Documentation is a subtly-different beast. It should be clean, should (ideally) be a first-order construct of the language, or at least easy enough to extend using IDEs. Furthermore, tooling should accompany documentation in such a way that it is easy for consumers of your code to read while developing on or with your code. Take the same example above in Python, which looks as follows:

def factorial(N):
    """
    @brief Computes the factorial of N.
    @param N An integer.
    """
    if N <= 1:
        return 1
    else:
        return reduce(lambda k, x: k * x, range(1, N+1))

Notice the following issues are solved:

  1. The function is appropriately named
  2. Implementation details (how the factorial is computed) are left out
  3. Documentation isn’t left in the form of comments, but as first-class features (docstrings)
  4. The documentation isn’t a few cobbled together lines, this can be parsed by tools such as Doxygen.
  5. Types are mentioned in the docstring, but this information is not repeated.

Overall I believe this to be a far more successful strategy in the long run than comments as seen in the initial example code. Now, the seasoned programmer may decry, “But Doxygen works with C as well, and there it parses comments to provide documentation.” Certainly, it does, however I believe this to be a very specific and special case where this is necessary. Raw comments such as the one above cannot be parsed in any sane fashion by tools, and beyond that misses the point of comments in the first place.

Comments by their nature no longer exist once the code is compiled, and therefore disappear. The added benefit of using strategies like Doxygen, JavaDoc, Python / Guile’s Docstrings, CHICKEN Scheme’s Hahn egg, or whatever other equivalent, is that these strategies are consistent and likewise mean you don’t have to write your documents twice. Moreover, since your documentation is next to your code, and since it often does get noticed if the behaviour of the code changes and the docstrings do not, it is much easier to maintain this style of documentation as a project evolves.

What’s left for comments?

Recall that I didn’t claim that comments should never be used, rather that they should be as limited as possible. Most of the time, if you want to use a comment, you really want better names, better abstractions, or a more formal means of documentation. Code comments, however, are better left for explaining why certain decisions are made in the code if the choice is not idiomatic or is obscured by syntax / optimizations. Effectively, use comments when you want to comment on why you, the developer, made a design decision. Do not repeat implementation details, do not repeat the code, and only do this when a design decision does not make sense in the context of the rest of the code base.

Other uses of comments are varied and depend largely on the organization of the project, but I’ve found the following useful:

Conclusion

Hopefully this articulates some of my hatred for the dogmatism behind “comment all the things.” Comments should used sparingly, if at all, and first-class documentation features (such as JavaDoc, or Python docstrings) should be used if they are available. In cases where these are not available, third-party structured documentation systems such as Doxygen or Hahn should be preferred, as the additional tooling to support this form of documentation is invaluable to 1) consumers of your code and 2) any contributors to the code.

Furthermore, comments should only be used to explain the why behind design decisions, or used sparingly to demarcate FIXMEs, TODOs, and workarounds for bugs from external code. Used in this way, you’ll often notice the size of your source files will drop, and will likewise have a much more rich ecosystem for understanding the code.

And for the love of anything sacred, never commit comments to a project that contain “temporary” code or leftover code that was removed but not deleted. Code changes should be managed by a version control system, not the damn project maintainer.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.