Bup, the backup tool with a clever idea
By: oever2
Jun
What backup tool are you using? You are using one, right? I am using one these days, namely git. My entire home directory is collection of git repositories. Using git for backups is great because it is easy to synchronize data. It is also easy to restore files without needing access to the backup server. I keep my .git directories in a separate partition and symlink them into the right position. Every few days I push all my git repositories to my backup server that has a user called 'git' with 'git-shell' as the shell setting. So sending backups to the server can happen safely over ssh.
There are downsides to using git though. The main problem becomes obvious if you work with large files. I do not have a lot of large files, but I do have a few virtual machines. One way of backing those up is to put them on a disk that supports taking snapshots. ZFS or 'lvmcreate --snapshot' are solutions for this. But what if you want to backup the different versions of large files to a different storage medium?
This is where bup comes in. Bup is a backup tool with a clever idea on top of git: it splits files by using a rolling checksum. Then it stores each file fragment in a git pack file.
To understand how this works, you need to understand a bit about how git works and how rolling checksums work.
In git, you can store every file for every version in every directory that you ever committed. If you make a commit in git, you store the entire directory tree. Yet, the disk usage of git is not so large. That is because each file is stored under a name that is determined from the content of the file. A fancy name for this is 'content addressable storage'. If two files are the same, they will have the same name. A directory is a list of file names. If directories have the same content, the list is the same, and the name under which the directory is stored in the .git folder is also the same. This concept is explained clearly in the git community book.
Git was designed for use with many small files. It cannot work efficiently with large files. If one bit changes, the entire file has to be stored twice. Git has a mechanism to store only the difference between files, but it is expensive to calculate this for large files.
Bup uses the idea of content addressable storage and solves the problem of big files with a rolling checksum. The concept is explained in an entertaining way in the bup DESIGN document. A rolling checksum is a checksum which you can slide along a data blob. You start at the start of the blob and calculate the checksum. Then you slide one byte along the blob; you substract the byte at the low end of the sliding window and add a byte at the high end of the sliding window. This gives you a new checksum. With a rolling checksum it is efficient to calculate a checksum for all positions in a blob (apart from the part where the sliding window started).
At each position where the last 13 bits of the checksum are 1, bup splits the blob. This gives chunks with an average size of 8192 bytes. All of these chunks are compressed and stored separately in a git packfile. If a bit is changed in the blob, two to three things change: the chunk where the bit is located, the following chunk (if there is one) and the list of chunks that is stored, similar to a directory, to restore the blob.
The current implementation of bup was written with speed of development in mind; git binaries are called from python code. There is a read-only FUSE implementation on top of bup to make it convenient to browse old versions of files. Going one step further, write support could be added. That would make bup into a filesystem, similar to Fossil with Venti or ZFS with deduplication. Bup would be better at deduplication, due to the rolling checksum.
WebODF on Android devices
By: oever31
May
Today the WebODF project released an Android app. You can get it from the Android Market and soon from FDroid.org. This is just the start. Viewing and editing office documents and in particular ODF files should be possible on all mobile devices. In the WebODF project we want to make this possible.
The Android application is 95% generic WebODF JavaScript and 5% Android specific Java code. For future ports to iPhone, iPad, MediaWiki and many other environments we will put most functionality in the shared JavaScript library and keep the application specific code to a minimum.
If you have a use case for WebODF, join us on #webodf or webodf@lists.opendocsociety.org to discuss it.
File selector in QML and PySide
By: oever29
May
Today I wrote a file selector in QML. This was not trivial because QML has no standard element for drilling down in a tree model. So I wrote one. A bit of Python was needed to expose the file system to the QML as a data model.
I've played with Bup a bit lately and wanted to write a GUI for it. Normal Qt widgets would do, but when the bup developers asked if it would run on MeeGo, I had a look at QML.

Update: check the comments for a new version.
The Python part of the code is simple and short:
WebODF on Android and beyond
By: oever6
Mar
ODF support on phones and tablets is not good right now. Work is being done to improve this by the Calligra project, but WebODF can provide a solution too. To prove this, I built a small wrapper application that gives Android the ability to read ODF files. This application is available in the WebODF repository and I've also put the installable application online.
To reach more phones and tablets, such as the iPhones, iPad, Blackberry and Symbian phones, we could use PhoneGap. Making a PhoneGap application from WebODF code is a nice way to get started on cross-device development and to help out with the adoption of ODF. If you want to give this a try, check out the WebODF code, get PhoneGap, read the code for the Android example to see how to adapt the WebODF runtime and get hacking. Good luck and have fun!
WebODF gains round-tripping support
By: oever1
Mar
In my previous blog I talked about converting ODF files to PDF files with WebODF. This is a functionality that is generally useful, but is also one that lets OfficeShots compare WebODFs ODF rendering to that of other office suites.
Another useful feature is round-tripping of ODF. Round-tripping is the process of loading an ODF file in a program and subsequently saving it again. It is an ODF to ODF conversion. OfficeShots uses round-tripping to see if an office suite generates valid ODF. In WebODF, the original ODF file is barely modified. The XML contents of the ODF is parsed and serialized in this step. Any bugs in this process would be exposed by roundtripping.
After building, the round-tripping can be performed like this:
qtjsruntime lib/runtime.js roundtripodf.js myfile.odp
The file will be roundtripped in-place, so make sure to make a copy before trying this.
In the next blog entry I'll talk about WebODF on Android, editing ODF or unhosted.org. Please vote in the comments, come to the irc channel #webodf or post bugs or comments.
Converting ODF documents to PDF with WebODF
By: oever23
Feb
It is quite common that one wants to send ODF files to people that lack the software to display ODF. One workaround is to convert the ODF to PDF. Most office suites that support ODF can export to PDF. To compare how different office suites do this conversion one can use the website OfficeShots. This website offers the ability to perform this conversion in many office suites at once and to compare the results.
WebODF wants to play with the grown-ups. So I have extended WebODF with the ability to convert from ODF to PDF. Here is a small script that shows how to do this conversion for a file /home/user/file.odt:
# compile WebODF git clone http://git.gitorious.org/odfkit/webodf.git mkdir build cd build cmake ../webodf make cd ../webodf/webodf # perform a conversion FILE=/home/user/file.odt cp "$FILE" . FILE=`basename "$FILE"` ../../build/programs/qtjsruntime/qtjsruntime --export-pdf render.pdf "odf.html#$FILE" ls render.pdf
WebODF at FOSDEM
By: oever17
Feb
The yearly FOSDEM was excellent as always. I could not attend all talks; mine was on sunday afternoon and as usual I was still improving it at the conference itself. Nevertheless, I spoke with many people and saw some very good presentations. Now that the videos are online, I will mention some of them with a link to the video footage.
Why Political Liberty Depends on Software Freedom More Than Ever (video).
Eben Moglen rallied the FOSDEM troops with early morning politics. He warns that we need not just Free Software, but also a Free Internet. If it is possible to turn off the internet, it is flawed.
Calligra Under the Hood (video).
Boudewijn Rempt gave a technical overview that shows that Calligra is a good starting point if you want to write your own custom office suite.
Building a free, massively scalable cloud computing platform (video)
Soren Hansen talked about the 'Apache of cloud solutions' OpenStack.
Firefox 4: new features for users and developers (video)
Tristan Nitot talked about the improvements in Firefox 4. It's nice to see them summarized in one talk. I particularly like the improvements in speed.
Cloud 9 IDE (video) A development environment for JavaScript in the browser: awesome!.
KDevelop: Rapid C++ Programming (video)
For those not entirely in the cloud, KDevelop is a great IDE. Milian Wolff gave a nice overview of some of the coolest features.
and of course the amazing talk on
WebODF: an office suite built on browser technology (video)
where I showed how to add WebODF to a website, how to write an Android application using WebODF and simply explained how it works. If this talk does not answer all your questions, come to #webodf on freenode, or post a question on forums.
WebODF at FOSDEM
By: oever5
Feb
Currently I am enjoying FOSDEM, the excellen Free Software conference in Brussels. Tomorrow I will give a presentation "WebODF: an office suite built on browser technology" about WebODF. If you want a preview, you can look at a screencast about it.
Office suites for the cloud are becoming more popular. All of them are closed source and, worse, running on a server that is outside of the control of the user. A Free Software solution for this problems is urgently needed.
WebODF is a library for adding OpenDocument Format (ODF) support to applications, regardless of whether they are running on the web or on the desktop. WebODF is a small JavaScript library that can display ODF documents in browsers and HTML widgets. Currently, simple editing support is being added. WebODF can be used in web applications and desktop applications.
WebODF is extremely innovative because it is the first FOSS implementation of an office suite based on HTML5. Using HTML5 means that the code will run on nearly all modern computing systems. On top of that, it uses CSS in such a way that the ODF document is used nearly unaltered as the run-time presentation. This simplification allows us to develop fast and with little code.
JavaScript: keep it working in different runtimes
By: oever3
Jan
The programming language JavaScript is seeing more and more use. Software written in it can run in many different environments. Not only do web browsers support it, there are quite a few programming environments that can integrate and run JavaScript code. Qt has support for it with the QtScript module. GNOME has JavaScript bindings via gjs. Node.JS is gaining popularity on the server and Java has the Rhino runtime.
Support for the basic language features of JavaScript is good among these runtimes. You can have a look at the list of dialects of JavaScript/ECMAScript to see that "ECMA-262, edition 3" is the most common specification that is implemented. Nevertheless, each of these environments has different facilities for accessing parts of the environment they are running in. Modularizing the code, access to the file system, logging, starting a new execution thread, running unit tests, these are but a few of the use cases for which there is no common solution.
There are few good practices that have helped me to keep my JavaScript code working in multiple runtimes. Most of the code for WebODF, an ODF project written in JavaScript, runs in the popular browsers, in QtScript, in Rhino and in Node.JS.
Abstraction
First of, I have written a small abstraction layer that wraps loading of modules, logging, unit testing and a few other things. This abstraction layer is not very large, it is a single file. The code contains an abstract class with implementations for the different runtimes. Whenever I need to access a runtime-specific funtion, I resort to this class, extending it where needed.
JSLint and Closure Compiler
JavaScript is a dynamic language, there is no compiler. This means that there are no steps required between writing the code and running the code. Code errors can easily slip in to released code. It is therefore very important to do static testing of the code. Two good tools for this are JSLint and the Closure Compiler. JSLint is a JavaScript program that analyzes code for correctness and style. Some features of the JavaScript language do more harm than good and JSLint brings occurrences of these to your attention so you can avoid them. The Closure Compiler can compile a collection of JavaScript files into one smaller file. But that is not why I use it. While 'compiling' the JavaScript, the Closure Compiler performs a number of checks on the code and catches certain problems before the code is actually run.
Unit testing
Running JavaScript on the command line, in a desktop program or on a website are very different. So sharing unit tests across these environments is a bit of work initially. Having good unit tests is invaluable though, so it is an investment you just have to make if you want to stay confident of your code. For WebODF, I have written a small script or web page for each environment in which I want to run the unit tests. So unit tests are written only once but tested in all environments where they are relevant.
An amazing tool for checking how much of your code is covered by unit tests is jscoverage. It can 'instrument' your code. While running the instrumented code, reports are created that show how often each line of JavaScript was run. This makes it easy to find for what parts of your code could benefit most from an additional unit test.
Conclusion
JavaScript is nearly everywhere. But to write JavaScript that can go nearly everywhere too, you need to take portability into account. The best way to do that is to develop for at least three runtimes in parallel.
OdfKit Hack Week day 3
By: oever25
Jun