GSOC 2010 Idea - Language Bindings Documentation Extractor

I've just added an idea for a Google Summer of Code project to the wiki; a Language Bindings Documentation Extractor/Generator tool.

For last years GSOC Arno Rehn wrote a tool called 'smokegen' which parses Qt and KDE C++ header files and generates language independent 'Smoke' libraries that are used by several bindings projects for Ruby, C#, Perl, PHP and JavaScript. It is plugin based, and the main part of the documentation extractor project would be to write a new plugin that would parse both headers and sources and extract doc comments and code snippets. For each different language the plugin would translate the C++ docs to a format suitable for the target language. Any embedded code snippets should also be translated as far as possible. Possibly it might be better to have seperate plugins for each language. Part of the project might be to write some sort of documentation viewer tool if one didn't already exist.

The PyQt and PyKDE bindings already have auto generated docs, and so a good place to start might be to see what is involved in creating those. I asked Arno about the project and how much work it would be, and he thought the main thing that would need to be added, was to parse method definitions in the C++ source files, as the smokegen parser doesn't do that at the moment.

Unfortunately there isn't much time as the deadline for the submissions is on the 9th April, so you'll need to get started pretty quickly. Ask any questions on the #kde-bindings IRC channel or the [email protected] mailing list.


I'll quickly explain how the docs work in PyKDE right now. I've invested a lot of time in the second half of last year working on the my tooling for maintaining PyKDE. Most of it is concerned with fixing up two parsers, one for SIP files and another for C++ headers. (The tool is called twine2 and I was/am planning to make it public gitorious, but I've delayed that a bit to wait and see what KDE is doing w.r.t. git hosting.)

The doc generation part just parses the SIP files which closely mirror the C++ headers but contain extra info and binding stuff, and it also parses the C++ headers. Each class, method etc in the SIP file is formatted as HTML docs. The block of text/docs for each class and method is looked up in the parsed C++ header, and massaged into HTML. Yes, there is a half implemented oxygen markup parser in there too. The SIP files are important to the process since method signatures can differ to their C++ counterparts, and some C++ methods don't directly exist in the bindings. I haven't seen a need to parse C++ method code.

By simon edwards at Wed, 04/07/2010 - 06:16

Thanks for the info. I think studying how the header doc comments are massaged into HTML in PyKDE would be a good way to start then if someone takes up the project. I mentioned parsing the sources because you need to do that for the Qt docs, which have the comments in the sources and not the headers

By Richard Dale at Wed, 04/07/2010 - 13:04