Akonadi migration explained
In an attempt to follow up on my blog about Akonadi porting xplained I am going to write about Akonadi migration.
It is basically the data storage related cousin of porting: Porting is, as we learned, about adapting applications to a new way of handling data. Migration is about adapting data to new ways of being accessed.
The last couple of months I unfortunately had too little time for development on either KDE or Akonadi so I spent the available time on thinking about mail migration.
You see, mail is a special case when we talk about migration because it has a couple of key differences when compared to other data types:
- amount
- location
- state/properties
The difference in amount of data is the most obvious one. Most users will have more messages in mail fodlers than contacts in their address book, messages will on average be a lot bigger than contacts, e.g. due to containing attachments. It is actually very likely that the amount of mail data is several orders of magnitudes greater than that of contact data. For example I have probably around 100 contacts in my address book and probably around 100 000 messages (if not more), consuming several Gigabytes of disk space.
The different in location of data is referring to the likeliness of messages being stored on a server vs. other data types like contacts. Recent developments like Google Contacts have shifted that somewhat but it is still more likely to encounter a setup with contacts being locally and mail being remotely stored. Due to this almost inevitable remote storage any migration process will have to deal with local caching of some sort, e.g. in the case of KMail the maildir used to cache mails of KMail's "Disconnected IMAP" account type.
The difference in data state or properties is a lot less obvious than the other two. Even in the most basic usage scenario we have the message state changing from unread to read but it is highly likely that messages have additional properties attached such as "important" or "you have replied to this one". This alone wouldn't be a problem if it would be part of the message or at least part of the file the message is stored in. However not all on-disk formats support that so applications like KMail had to find alternative ways of storing this additional information, e.g. "index" files.
Lets have a look at a couple of scenarios to see how these influence the miration process.
For comparison we'll take migration of a local address book:
- get location of address book file (e.g. KDE's std.vcf)
- create Akonadi storage handler for a VCard file
- point it to the location of the file
- done
A similar approach will work for a local maildir directory:
- get location of the maildir root directory (e.g. $HOME/Mail, $KDEHOME/share/apps/kmail/mail)
- create Akonadi storage handler for MailDir
- point it to the location of the file
- done
The messages stay right where they are, so amount of data is irrelevant. MailDir can encode most of the state data into the file name, so not an immediate problem either. Storage is local, no server involved, no caching to deal with.
Now have a look at another form of local mail storage: mbox
- get location of the mbox file (again rather trivial)
- create Akonadi storage handler for MBox
- point it to the location fo the file
- done
Done? Not quite. Doing it that way we'll lose state data in a way that's probably not acceptable to our users.
So what could we do? One option would be to make the Akonadi storage handler for MBox understand, e.g. KMail's index files, but that is quite ugly, involved maintaining old code (at least the reading part) and is KMail specific or requires the Akonadi MBox resource to understand all kinds of such additional files.
I'll get back to this later but lets have a look at another example first: mbox file within a maildir tree
- get location of the mbox file (again rather trivial)
- create Akonadi storage handler for MBox
- point it to the location fo the file
- done
Since this shares the same problem as stand-alone mbox, I'll skip the related problems. However, we have some additional issues here, one being that the Akonadi MBox resource will create a top level folder in Akonadi, thus "moving" the mail box folder out of the tree while keeping the file in it. One possible solution would to have a resource which can handle mboxes inside maildir trees, but since mbox folders and maildir folders behave differently (e.g. mbox folders need to be "compacted" to really delete mails) we don't consider this a proper solution unless we run out of alternatives.
To not forget about the server location problem, lets finally also look at disconnected IMAP:
- get server connection values and login credentials
- create Akonadi storage handler for IMAP
- tell it about server and user
- done
As an attention paying reader you already know that we are not quite done yet :) So what is it this time? Can't be state of message, everything is on the server. Can't be the resulting folder locations, IMAP servers have always been treated separately.
Obviously it has something to do with the "disconnected" part, so lets have a closer look at that. It means that KMail has a maildir tree somewhere that is more or less a copy of what's on the IMAP server. More or less because it is like synchronizing between a remote and a local directory, i.e. changes on either side are not immediately visible on the other side, they are applied at resychronization times.
This leads to two complications for the migration process:
- the users will be very angry if we have them download all those message again
- some messages might have been added or deleted locally and the respective changes have not been synchronized yet
Again a possible solution would be to make the Akonadi IMAP resource understand this local cache and transaction logs, but again this is not a very clean solution.
I am sorry that this is already quite a long blog but maybe you are still interested in some of my thoughts on how to make this work nevertheless.
If we treat the process more like a form of importing instead of simply reattaching different storage handlers, we gain the possibility to change format and locations in a way that allows us to inform the users about these changes and most likely also allow for an advanced mode for people with really specific needs. So instead of silently doing things in the background, KMail2 could bring up a GUI saying that it has detected a KMail1 setup and lets you choose between ignoring that, importing that the way it sees fit or letting you switch to an import GUI for customization.
Our scenarios above can then be handled like this:
MailDir: no difference there but potentially allowing a customized import routine to move the messages to a new base directory
MBox: top level mbox files can be handled by the Akonadi MBox resource, the importer can be KMail specific and understand the index files, attaching the additional information to the resulting Akonadi message items. MBox files in side the maildir tree can be read by the importer and added as a new folder to the top level folder managed by the Akonadi MailDir resource, potentially allowing a customized import routine to move the mbox file and treat it like a top level folder instead.
Disconnected IMAP: similar to the "in-tree mbox" case, the importer can be made to understand KMail's form of caching and transaction state handling. However, differently to the "mbox -> maildir folder" conversion, we do not want the Akonadi IMAP resource to add the messages on the server, most of them will already be there.
Again my main idea is to let the importer understand what it is actually importing, in this case how the message is addressed on the server, and attach this information to the message when adding it to a folder managed by the Akonadi IMAP resource. The IMAP resource will therefore only have to be extended to understand this additionally attached information, not how that used to be stored on disk.
I hope to have some time during the Chrismas holidays to experiement with some of the ideas.