Easier OpenDocument coding in Calligra and lovely junior jobs

Tuesday, 30 July 2013 | Oever

The office suite Calligra can save many file formats, but the main one is OpenDocument Format. With a proposed improvement, it can be easier than ever to code on Calligra.

The OpenDocument files (odt, ods, odp) that Calligra writes are zip files with XML documents and images. The XML documents have a schema, a set of rules that say what goes where in the files. Here is a small sample of ODF, indented for readability:

  <text:p text:style-name="my_italic">
    <text:span text:style-name="my_bold">
      Hello Calligra!
    </text:span>
  </text:p>

This sample is very simple and looks a lot like HTML. Since ODF describes office documents, there are very many more XML tags that one can use, including tags for spreadsheet formulas, databases, styling, scripting, presentations, semantic data, change tracking, business forms and much more. And each element can have a number of attributes specific to the element. That is a lot to know, so working on ODF is very interesting.

Currently in Calligra, to write this example document fragment, we can write code like this:

  xmlWriter->startElement("text:p");
  xmlWriter->addAttribute("text:style-name", "my_italic");
  xmlWriter->startElement("text:span");
  xmlWriter->addAttribute("text:style-name", "my_bold");
  xmlWriter->addTextNode("Hello Calligra!");
  xmlWriter->endElement();
  xmlWriter->endElement();

This is a simple, generic XML interface, not specific to ODF. Tags and attributes are written as strings. That's not the easiest or safe code to write.

When the proposed improvement is accepted (it is on reviewboard now) as is, the code for the document fragment will look like this:

  text_p p(xmlWriter);
  p.set_text_style_name("my_italic");
  text_span span(p.add_text_span());
  span.set_text_style_name("my_bold");
  span.addTextNode("Hello Calligra!");

Do you spot the differences? In fact there are many.

Instead of calling xmlWriter directly, a set of classes is used that have names similar to the XML tags. "text:p" becomes text_p, "text:span" becomes text_span etcetera. The C++ editor and compiler will know about these classes and check for typing errors and give you auto-completion.

By using one class instance per written tag, you give the attributes and children a context: it is easy to see how the elements are nested. And again the editor and compiler help out: they suggest and check which tags are allowed to be used in which other tags.

Also, the call to endElement is gone. The tag is closed when the object goes out of scope: it's not possible to forget what you do not have to write.

So as you can see this new way of writing ODF is simpler and safer. The task ahead is to get the proposed improvement accepted and then to port all the places where ODF is written to this new API.

And that is where everybody can join in. It is simple, fun and rewarding to convert a whole C++ file to the new API. As part of the initial patch, I have converted three files and the code looks way better now. I even found and fixed a few small errors that were uncovered by this stricter API.

I wrote most of this code at the Akademy. It was a very inspiring week and I simply had to revive this code which I'd started on two years ago and I'm happy that it's finally close to landing.