More metadata and a new year's resolution

Tuesday, 20 January 2009 | Trueg

Amazing how long it always takes for me to write a log entry. So many times in the last months I told myself I had to write the next entry... well, new year's resolution (a little late I know): more blogging about what I am up to (regarding KDE of course).

Well then, let's see, KDE 4.2 is around the corner and the Nepomuk features look pretty stable. Strigi is nicely integrated, it can be suspended and resumed (which is does automatically in battery mode or if the harddisl is full), the folders to index can be configured including subfolders. Krunner comes with a Nepomuk search plugin which means you can simply run queries from there. The KIO slave while not yet nicely integrated into the GUI, allows to query stuff from Dolphin or the file open dialog (something I blogged about a long time ago). Almost everything is multi-threaded for your non-GUI-blocking pleasure and tags can be reused in Gwenview. The only thing still imperfect is the storage backend based on Java (too much mem usage), although even that will be solved soon thanks to the nice guys from Openlink. But that is a story for another day (remember: new year's resolution).

Thus, finally we have a good foundation to build new stuff upon and that is what this blog entry is actually about. So let's have a look.

Again I am using Dolphin as the example. But why not, it is our file manager and we are used to the file manager also handling a bit of meta-data. Anyway, in KDE 4.2 Dolphin does display a little bit of meta-data for each file. This includes the size, the type, and some fields directly extracted via Strigi. The latter include id3 tags and some 5 or 6 fields more. However, these are hardcoded in Dolphin. Thus, apart from id3 tags not much is displayed. For example no exif properties. Well, all this information is stored in Nepomuk so why not use it? And that is what I have done. Take a loot at the first screenshot which shows Dolpin displaying meta-data from Nepomuk in a generic way. Meaning, nothing is hardcoded. The properties are read from the Nepomuk store, the labels are read from the Nepomuk/Xesam ontologies and everything is nicely extendable (as we will see later on).

As you can see, some properties are shown twice. That is because everything before "Source modified" comes from Dolphin's hardcoded properties while everything else comes from Nepomuk. Let us have a quick look at the code. After all this is a developer blog:

Nepomuk::Resource res( item.url() ); QHash<QUrl, Nepomuk::Variant> properties = res.properties(); for ( QHash<QUrl, Nepomuk::Variant>::const_iterator it = properties.constBegin(); it != properties.constEnd(); ++it ) { Nepomuk::Types::Property prop( it.key() ); m_metaTextLabel->add( prop.label(), Nepomuk::formatValue( res, prop ) ); }

Now this looks simple enough I think (although I shortened it a bit, the original code does a bit of filtering). Basically we create a Nepomuk::Resource for the selected file and read all its properties. This returns a map of property URIs (remember: all in Nepomuk is defined as ontologies, hence URIs) and property values. Then the map is iterated and for each entry a line is added to the meta-data display. Now what about the Nepomuk::formatValue call? Well, the values can be literals such as strings or integers or doubles but they can also be other resources (other files, tags, persons or whatever). We do not want to display resource URIs to the user. The formatValue call triggers an experimental lib which uses formatting rules to convert resources into strings. An example: a person resource has firstName and lastName properties. The rule would then state that they are to be combined to build the label. Another simple example would be a file: the rule should state that the filename is to be used. Again we will see an example in action shortly (if you dare continue reading that is ;).

Ok then, now Dolphin displays our meta-data and I claim it does so generically. Then what about some new data: I want to remember the source of downloads, both web and IM downloads. The first one can actually be handled within KIO while the second one means to patch Kopete. I did both. But first we need to know how to store this information. Both the Nepomuk ontologies and the Xesam ontology do not provide the necessary properties. Thus, the first step is to create our own ontology for downloads. I will only draft it here quickly, it is not big anyway. (Remember: In Nepomuk all data is stored as RDF which means triples. If that is confusing, think of it as an object-oriented database where you can have classes and subclasses and class members which are here called properties.) It all revolves around the Download class which has subclasses like HTTPDownload or IMDownload. Then there are properties like sourceURL and one to relate local files to the download. (for everyone interested in the details: you can find the ontology in playground: NRDO)

Ok then, let's integrate it into KIO somewhere in the file copy job:

Nepomuk::Resource fileRes( destinationUrl, Soprano::Vocabulary::Xesam::File() ); Nepomuk::Resource downloadRes( QUrl(), Nepomuk::Vocabulary::NDO::HttpDownload() ); downloadRes.setProperty( Nepomuk::Vocabulary::NDO::sourceUrl(), sourceUrl ); downloadRes.setProperty( Nepomuk::Vocabulary::NDO::startTime(), Nepomuk::Variant(startTime) ); downloadRes.setProperty( Nepomuk::Vocabulary::NDO::endTime(), Nepomuk::Variant(QDateTime::currentDateTime()) ); fileRes.setProperty( Nepomuk::Vocabulary::NDO::download(), downloadRes );

As we can see I did not use the Nepomuk resource generator to generate C++ classes. Instead I went the other way and generated a vocabulary class using the onto2vocabularyclass tool provided by Soprano. Actually it is quite easy to integrate that into cmake.

Now what is happening here? We again create a Nepomuk resource for the local file which has been downloaded. Then we create the download resource, set some nice properties and then relate the file to the download. This combined with a little formatting rule for downloads gives us the following display in Dolphin:

Nice, isn't it? Well, this is the actual source URL. My plan is (and the ontology has a property for that) to also store the referrer web page which is more interesting in most cases. But I did not manage to make that work yet (tried to hand that information down through the KIO::Job metadata).

And the exact same thing can be done for Kopete. Only in this case we create an IMDownload and relate it to a person via their IM account instead of a source URL. The following code does work but also creates a new IMAccount resource for each download. The goal has to be to reuse the account resources that already exist (again a reason to push the Akonadi/Nepomuk integration):

First we create the IMAccount resource:

Contact* contact = d->info.contact(); Nepomuk::Resource imAccount( contact->nickName(), Nepomuk::Vocabulary::NCO::IMAccount() ); imAccount.setProperty( Nepomuk::Vocabulary::NCO::imNickname(), Nepomuk::Variant( contact->nickName() ) ); Nepomuk::Resource imContact( QUrl(), Nepomuk::Vocabulary::NCO::PersonContact() ); imContact.setProperty( Nepomuk::Vocabulary::NCO::hasIMAccount(), imAccount ); imContact.setProperty( Nepomuk::Vocabulary::NCO::fullname(), Nepomuk::Variant( contact->formattedName() ) );

After that we create the actual download resource which looks quite similar to the example from KIO:

Nepomuk::Resource downloadRes( QUrl(), Nepomuk::Vocabulary::NDO::IMDownload() ); downloadRes.setProperty( Nepomuk::Vocabulary::NDO::startTime(), Nepomuk::Variant(startTime) ); downloadRes.setProperty( Nepomuk::Vocabulary::NDO::endTime(), Nepomuk::Variant(QDateTime::currentDateTime()) ); downloadRes.setProperty( Nepomuk::Vocabulary::NDO::sendingContact(), imContact ); Nepomuk::Resource fileRes( destinationUrl, Soprano::Vocabulary::Xesam::File() ); fileRes.setProperty( Nepomuk::Vocabulary::NDO::download(), downloadRes );

And this is what the result looks like, again combined with a formatting rule:

Ok, that's it for today. I hope this will become stable soon so we can have some nice additional meta-data in 4.3. Also: I could use some help with this. Not only with integrating into KIO or Kopete or KTorrent but also with the ontology design and the formatting. Both are still rather experimental.

A little sidenote: I am still a bit disappointed that the blog system here changed. No more C++ code highlighting, no more fancy image handling with automatic thumbnails... or maybe it still works somehow but there is no documentation? I was not able to get an answer so far. So I am using html img tags to include my images which is no fun.