May 112009

This morning I finally got around to fixing numerous problems with the Gecko binary stream documentation. The articles for both nsIBinaryInputStream and nsIBinaryOutputStream were pretty whacked, with totally incorrect descriptions for a number of methods, plus methods completely missing in some places.

These two articles were converted from automatically generated documents, and the result exposes some of the inherent limitations to automatic documentation generation.  If you look at the IDL for these two interfaces, you’ll see that the comments are not very thorough. Individual methods have no comments; instead, the comments are for groups of methods (at best). In addition, not all parameters have comments describing them.

The result is that the importer tool has to make guesses in a lot of cases, and the results are often sub-optimal.  In this case, the last available method-describing comment was duplicated for each method added to the document, resulting in every method in nsIBinaryInputStream being listed as returning Boolean value read from a single byte from the stream.

While there’s certainly a place for automated tools in documentation, there always needs to be a human reviewing the output, because unless you can guarantee that the comments and markup in the IDL files is correct, the output won’t be.  It’s clear that these two articles, for example, were never really reviewed.

This is why I prefer to take a slow and measured pace as we investigate automation of documentation tasks. It’s very easy to automatically generate bad documentation. It’s very, very hard to automatically generate good, reliable documentation.  In my opinion, it’s not worth doing if you can’t do it right.

That’s why my feeling is that using tools to do an initial, automatically generated rendition of a document is fine, but from then on, it should be maintained by human writers and editors. That way, we can better control quality. And every article needs to be reviewed by at least one human editor as well — preferably both a writer and an engineer — in order to ensure accuracy.

I’d like to have a tool that can take IDL input and generate output formatted following our interface reference style guidelines. Then we can clean up the output and post it. That would help us get our content filled out faster. We just can’t blindly post everything it generates.

 Posted by at 4:51 PM

  9 Responses to “Thoughts on automated documentation generation”

  1. I think we could improve the situation by modifying the tools to complain more about invalid inline documentation. Ie docs that fail to mention a return value or ones that lie about the return value. It doesn’t seem right if most of the editing effort is shifted onto 1 person’s shoulders.

  2. Automatic docs will only work well when they’re displayed as the programmer edits the source file. Any post-processing makes them extra work you do for the next person without any immediate benefit to you.

    It seems like this is doable in modern IDEs. Re-run doxygen on every keystroke in comments and show the results in a separate window. But why can’t an editing window like Bespin display a file the way does, complete with hyperlinking? Better still, the moment your cursor leaves a comment block it should become attractive paragraphs of HTML, not monospace.

    Ideally should be the same source file, just rendered differently with the code blocks initially hidden. Provide the transform to programmers along with a single keystroke to preview it, and you’ll get better documentation.

    Literate programming is too ambitious, and pretty-printing isn’t ambitious enough.

  3. The problem with actually generating the documentation from the source, and only from the source (as opposed to a one-time import then editing by hand from then on) is that because only very select people can directly edit the source and check in changes, it would pretty much completely destroy productivity on the documentation.

    I for one would not want to have to submit patches for every copyedit I make every day, get them reviewed and checked into the tree, and so forth.

    That would be time-consuming and highly inefficient, in my opinion. I think the result would be that nobody would work on the documentation at all anymore, except for the engineers, and frankly, that’s a sure road to long term ruin when it comes to documentation. :)

  4. Have you thought about the possibility of setting up a process to occasionally feed the latest version of the human-edited text back into the IDL itself? It feels like there’s a huge amount of value in having some sort of feedback loop here…


  5. To elaborate on that a bit, I completely agree with your assertion that attempting to use the current code-review processes for this continuously would simply fail. I suspect, though, that the right combination of processes (only attempt the feedback periodically, figure out how to get dev input in a timely fashion instead of the usual reviews, etc) and tools (make it easy to see what has changed between MDC and the IDL) could be made to work. It would certainly be a sizeable project, and would require multiple iterations to get to a good spot, but it seems like it could be a great long-term goal.

  6. Dan – that’s exactly the sort of thing I’m thinking about. Just not sure exactly how it would work. :)

  7. It’s important to note that what you described in your post is not a problem of *automatically generating/extracting* docs from the IDLs but of *bad/missing in-source docs*. And you can have whatever tooling, when the source lacks style/docs, you cannot automatically extract anything meaningful. Of course, it’s not easier writing outside-of-source documentation in that case either.

  8. Thanks for responding.

    only very select people can directly edit the source and check in changes… I for one would not want to have to submit patches for every copyedit I make every day, get them reviewed and checked into the tree, and so forth.

    I think you should chat with the mega-brained coders working on Oink/Pork/Elsa/Hydra and also heavy users of Mercurial. You should be able to assert that a commit only changes comments and have systems that verify this, then others reviewing and accepting your edits becomes much less stressful because there’s no fear they might have changed something.

    Through the magic of DVCS you should also be able to work on doc in source code separate from developers. As they update you can merge with tip and a) resolve conflicts and b) notice changes to code that might make the documentation obsolete. They can merge the improved doc with trunk when they’re ready.

    Have some faith that technical solutions are available!