Rationale

We have around 60 "projects" (code repositories) that we actively develop, not counting forks. More than 40 of them have some sort of documentation, from the simple README-like "How to install" page to medium-sized mixes of narrative functional and technical documentation, automatically-generated API documentation and third-party PDF documents about external web services, banking file formats [1], etc. We also have a "Developer's guide" that contains an ever-growing set of documents about internal and external tools, our Git/GitHub workflow, coding rules, new team members' setup, thematic cookbooks, etc.

Our documentation is written in reStructuredText (reST), a text format that is similar to Markdown, AsciiDoc or Creole. We use Sphinx to build the documentation. Sphinx is a great tool that can generate a "nicely-organized arrangement of HTML files" from reST files (as its own web site put it) [2]. It also provides a client-side search engine (by generating a JavaScript search index).

Because we started from a single main project (or almost) that was then split, the documentation of a particular feature or technique was sometimes hard to find. Was it in the documentation of project A or was it in project B? We needed a tool that would search across the documentation of all our projects.

Rejected options

Single repository

Since our goal was to centralize the documentation of all our projects, we thought about merging them in a single place, build everything with Sphinx and benefit from Sphinx search features. Concretely, it meant setting up a single Git repository with the documentation of each of our projects as Git submodules or subtrees. We would then set up a global home page that would link to each documentation index page. And we would let Sphinx build everything.

This solution did not really fit our needs. We use Autodoc, an extension in Sphinx that generates documentation from docstrings (a special kind of comments in Python code). For this extension to work, all code must be importable by Sphinx. Unfortunately, it could not work on our mix of Python projects that have varying and conflicting requirements. We could not easily install a single set of requirements that would match the requirements of all projects.

Readthedocs

Readthedocs is a documentation hosting tool (and free service) based on Sphinx. The "repository" part of Dokang is similar to what Readthedocs provides, although Readthedocs obviously does a lot more than what we needed in that area. However, at the time Dokang was written (and as of July 2015), there was no global full-text search feature in Readthedocs to our knowledge and contributing one seemed more complex than writing our own tool. Moreover, using the public Readthedocs.org service was not an option for our internal projects (Readthedocs.com private business service was not available yet) and maintaining our own installation of Readthedocs was deemed too much for what we needed.

Dokang

Since we did not find a proper, simple tool that would fit our needs, we wrote Dokang, a very simple web application based on the Pyramid web framework and Whoosh, a search engine written in pure Python. Even though everything is in Python, we get decent performance and almost immediate search results for our set of 1,000 documents.

Dokang is a web application that:

  1. Provides an endpoint for clients to upload their documentation.

    Sending documentation to Dokang is as simple as issuing a POST query such as [3]:

    $ curl \
      -X POST \
      --form name=project_name \
      -F ":action=doc_upload" \
      -F content=@../documentation.zip \
      https://dokang:my-secret-token@dokang.example.com/upload
    
  2. Serves a home page with a list of all documentations and a simple search form that lets users search in HTML, text and PDF files. Other formats can be handled through the use of extensions.

  3. Serves all documentations (although you could make your web server handle that part if you prefer).

Continuous documentation

Since we already had an installation of Jenkins that continuously runs our tests, it seemed obvious to hook the building of our documentation there as well. For each project, we now run its tests, then build the documentation with Sphinx and send it to Dokang.

Limitations and future plans

Dokang serves us well but there is always some room for improvement:

  1. Additional search-related features could be implemented with Whoosh: auto-completion, spelling correction, etc.
  2. One of the many features of Readthedocs is that it can serve multiple versions of a documentation: different languages and different versions of the project. The latter option could be useful for us. Indeed, we always serve the documentation of the latest version of the code, even though partners may be more interested in what is currently deployed, which may be lagging a bit.

Further details

You can find more about Dokang on its web site. Dokang is written by Polyconseil and is licensed under the 3-clause BSD license. It is available on GitHub. Feel free to contribute!


[1]including two rather dry banking file format reference documents written in Comic Sans, which goes to show that people in finance do have a sense of humour, after all.
[2]Sphinx can also generate a single-file HTML documentation, a PDF file, man pages, Texinfo files, etc.
[3]We tried to make uploads even simpler for Python projects through the upload_docs setuptools command (that is run with something like python setup.py upload_docs). Unfortunately, we did not succeed. The details have been forgotten but it may have involved some oddity in (a possibly now old version of) setuptools with respect to authentication or an SSL-related issue with Python 2.6 (that we still use for some projects because we are that old school).