Skip to content

XML Data Binding for C++

Sunday, 6 August 2006  |  Oever

Strigi has reached the point that the configuration files for it should be more advanced than a text file with one directory per line. Because I have good experience with using XML Schema for mapping from XML to java and back using JAXB, I'd been looking for a good toolkit that does the same in C++. The requirements for such a tool are:

  • Simple
  • Small
  • Few dependencies
  • Use XML Schema

A tool I've found is called xsd. Behind this trivially simple name is a suite of free software tools that generate different types of C++ code. The latest version of this tool is 2.2.0 and can be gotten here. There's an extensive manual starting at a simple Hello World application.

To test the feasibility of this code for Strigi I designed a small XML Schema file and wrote a simple program to accompany it. For Strigi, I'd like to have a configuration file that describes what directories and other sources are indexed and how. Below I've written the different file I used for testing xsd. The workflow is pretty easy:

  1. write XML Schema
  2. compile c++ from schema
  3. use data classes in c++ code

If the requirements for the configuration file change, the XML Schema is changed and the c++ code regenerated. Any changes required in the c++ code that uses the data classes will be caught by the compiler.

Nothing is for free: using xsd introduces dependencies in your code. When using xsd, you will need to link your executable with xerces-c. Since this is a widespread library, this is not a big problem.

An example configuration file might look like this:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<s:daemonConfiguration xmlns:s="http://www.vandenoever.info/strigi">

<repository repositoryLocation="/home/tux/.strigi/mainindex" repositoryType="CLucene"> <fileSystemSource baseURI="file:/home/tux"/> <httpSource baseURI="http://www.kde.org/"/> </repository>

</s:daemonConfiguration>

For convenience, this file is kept simple. What you can see is that Strigi could have configurations for multiple repositories (although this file only shows one). Each repository has one index which contains information extracted from various sources, such as files from the filesystem or web pages. These sources are described by the elements fileSystemSource and httpSource. These elements are both instances of the more general fileSource element.

The vocabulary can be described with an XML Schema:

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
 targetNamespace="http://www.vandenoever.info/strigi"
 xmlns:tns="http://www.vandenoever.info/strigi">

<element name="daemonConfiguration" type="tns:daemonConfigurationType"/>

<complexType name="daemonConfigurationType"> <sequence> <!-- a repository contains one index --> <element name="repository" type="tns:repositoryType" minOccurs="0" maxOccurs="unbounded"> </element> </sequence> </complexType>

<complexType name="repositoryType"> <sequence> <element name="fileSystemSource" type="tns:fileSystemSourceType" minOccurs="0" maxOccurs="unbounded"/> <element name="httpSource" type="tns:httpSourceType" minOccurs="0" maxOccurs="unbounded"/> </sequence> <attribute name="repositoryLocation" type="anyURI" use='required'/> <attribute name="repositoryType" type="tns:repositoryTypeType" use='required'/> </complexType>

<simpleType name="repositoryTypeType"> <restriction base='string'> <enumeration value='CLucene'/> <enumeration value='HyperEstraier'/> <enumeration value='Xapian'/> <enumeration value='Sqlite'/> </restriction> </simpleType>

<complexType name="fileSourceType"> <attribute name="baseURI" type="anyURI" use='required'/> <!-- time between updates for this directory, <= 0 means never --> <attribute name="autoUpdateFrequence" type="int"/> </complexType>

<complexType name="fileSystemSourceType"> <complexContent> <extension base="tns:fileSourceType"> <sequence> <element name="fileEventListener" minOccurs="0" maxOccurs="1"> <complexType> </complexType> </element> </sequence> </extension> </complexContent> </complexType>

<complexType name="httpSourceType"> <complexContent> <extension base="tns:fileSourceType"> </extension> </complexContent> </complexType>

</schema>

This XML Schema file can be compiled into source code with the command xsd cxx-tree --generate-serialization strigidaemon.xsd. Here is a simple program that uses this code to read and write this type of configuration file:

#include "strigidaemon.hxx"
#include <iostream>
using namespace std;

int main (int argc, char* argv[]) { auto_ptr<strigi::daemonConfigurationType> config;

if (argc &gt; 1) {
    // load an object
    try {
        config = strigi::daemonConfiguration(argv[1],
            xml_schema::flags::dont_validate);
    } catch (struct xsd::cxx::tree::exception&lt;char&gt;&amp; e) {
        cerr &lt;&lt; e &lt;&lt; endl;
    }
}
if (!config.get()) {
    // create an object
    config = auto_ptr&lt;strigi::daemonConfigurationType&gt;(
        new strigi::daemonConfigurationType());
}

// add elements to the object
string repositoryLocation = &quot;/home/tux/.strigi/mainindex&quot;;
strigi::repositoryTypeType repositoryType(&quot;CLucene&quot;);
strigi::repositoryType repo(repositoryLocation, repositoryType);

strigi::fileSystemSourceType fs(&quot;file:/home/tux&quot;);
repo.fileSystemSource().push_back(fs);

strigi::httpSourceType hs(&quot;http://www.kde.org/&quot;);
repo.httpSource().push_back(hs);

config-&gt;repository().push_back(repo);

// provide a mapping for the namespace we use
xml_schema::namespace_infomap map;
map[&quot;s&quot;].name = &quot;http://www.vandenoever.info/strigi&quot;;

// output the object to the standard output iostream
daemonConfiguration(cout, *config, map);

return 0;

}

To finish off, here's the Makefile I used to compile this small test program:

LDFLAGS=-lxerces-c
CXXFLAGS=-Wall -O2
main: main.cpp strigidaemon.cxx

strigidaemon.cxx: strigidaemon.xsd xsd cxx-tree --generate-serialization strigidaemon.xsd

clean: rm strigidaemon.cxx strigidaemon.hxx main