oever's blog

    oever's picture

    Bup, the backup tool with a clever idea

    2011
    2
    Jun

    What backup tool are you using? You are using one, right? I am using one these days, namely git. My entire home directory is collection of git repositories. Using git for backups is great because it is easy to synchronize data. It is also easy to restore files without needing access to the backup server. I keep my .git directories in a separate partition and symlink them into the right position. Every few days I push all my git repositories to my backup server that has a user called 'git' with 'git-shell' as the shell setting. So sending backups to the server can happen safely over ssh.

    There are downsides to using git though. The main problem becomes obvious if you work with large files. I do not have a lot of large files, but I do have a few virtual machines. One way of backing those up is to put them on a disk that supports taking snapshots. ZFS or 'lvmcreate --snapshot' are solutions for this. But what if you want to backup the different versions of large files to a different storage medium?

    This is where bup comes in. Bup is a backup tool with a clever idea on top of git: it splits files by using a rolling checksum. Then it stores each file fragment in a git pack file.

    To understand how this works, you need to understand a bit about how git works and how rolling checksums work.

    In git, you can store every file for every version in every directory that you ever committed. If you make a commit in git, you store the entire directory tree. Yet, the disk usage of git is not so large. That is because each file is stored under a name that is determined from the content of the file. A fancy name for this is 'content addressable storage'. If two files are the same, they will have the same name. A directory is a list of file names. If directories have the same content, the list is the same, and the name under which the directory is stored in the .git folder is also the same. This concept is explained clearly in the git community book.

    Git was designed for use with many small files. It cannot work efficiently with large files. If one bit changes, the entire file has to be stored twice. Git has a mechanism to store only the difference between files, but it is expensive to calculate this for large files.

    Bup uses the idea of content addressable storage and solves the problem of big files with a rolling checksum. The concept is explained in an entertaining way in the bup DESIGN document. A rolling checksum is a checksum which you can slide along a data blob. You start at the start of the blob and calculate the checksum. Then you slide one byte along the blob; you substract the byte at the low end of the sliding window and add a byte at the high end of the sliding window. This gives you a new checksum. With a rolling checksum it is efficient to calculate a checksum for all positions in a blob (apart from the part where the sliding window started).
    At each position where the last 13 bits of the checksum are 1, bup splits the blob. This gives chunks with an average size of 8192 bytes. All of these chunks are compressed and stored separately in a git packfile. If a bit is changed in the blob, two to three things change: the chunk where the bit is located, the following chunk (if there is one) and the list of chunks that is stored, similar to a directory, to restore the blob.

    The current implementation of bup was written with speed of development in mind; git binaries are called from python code. There is a read-only FUSE implementation on top of bup to make it convenient to browse old versions of files. Going one step further, write support could be added. That would make bup into a filesystem, similar to Fossil with Venti or ZFS with deduplication. Bup would be better at deduplication, due to the rolling checksum.

    oever's picture

    WebODF on Android devices

    2011
    31
    May

    Today the WebODF project released an Android app. You can get it from the Android Market and soon from FDroid.org. This is just the start. Viewing and editing office documents and in particular ODF files should be possible on all mobile devices. In the WebODF project we want to make this possible.

    The Android application is 95% generic WebODF JavaScript and 5% Android specific Java code. For future ports to iPhone, iPad, MediaWiki and many other environments we will put most functionality in the shared JavaScript library and keep the application specific code to a minimum.

    If you have a use case for WebODF, join us on #webodf or webodf@lists.opendocsociety.org to discuss it.

    oever's picture

    File selector in QML and PySide

    2011
    29
    May

    Today I wrote a file selector in QML. This was not trivial because QML has no standard element for drilling down in a tree model. So I wrote one. A bit of Python was needed to expose the file system to the QML as a data model.

    I've played with Bup a bit lately and wanted to write a GUI for it. Normal Qt widgets would do, but when the bup developers asked if it would run on MeeGo, I had a look at QML.

    QML File Selector

    Update: check the comments for a new version.

    The Python part of the code is simple and short:

    oever's picture

    WebODF on Android and beyond

    2011
    6
    Mar

    ODF support on phones and tablets is not good right now. Work is being done to improve this by the Calligra project, but WebODF can provide a solution too. To prove this, I built a small wrapper application that gives Android the ability to read ODF files. This application is available in the WebODF repository and I've also put the installable application online.

    To reach more phones and tablets, such as the iPhones, iPad, Blackberry and Symbian phones, we could use PhoneGap. Making a PhoneGap application from WebODF code is a nice way to get started on cross-device development and to help out with the adoption of ODF. If you want to give this a try, check out the WebODF code, get PhoneGap, read the code for the Android example to see how to adapt the WebODF runtime and get hacking. Good luck and have fun!

    oever's picture

    WebODF gains round-tripping support

    2011
    1
    Mar

    In my previous blog I talked about converting ODF files to PDF files with WebODF. This is a functionality that is generally useful, but is also one that lets OfficeShots compare WebODFs ODF rendering to that of other office suites.

    Another useful feature is round-tripping of ODF. Round-tripping is the process of loading an ODF file in a program and subsequently saving it again. It is an ODF to ODF conversion. OfficeShots uses round-tripping to see if an office suite generates valid ODF. In WebODF, the original ODF file is barely modified. The XML contents of the ODF is parsed and serialized in this step. Any bugs in this process would be exposed by roundtripping.

    After building, the round-tripping can be performed like this:
    qtjsruntime lib/runtime.js roundtripodf.js myfile.odp

    The file will be roundtripped in-place, so make sure to make a copy before trying this.

    In the next blog entry I'll talk about WebODF on Android, editing ODF or unhosted.org. Please vote in the comments, come to the irc channel #webodf or post bugs or comments.

    oever's picture

    Converting ODF documents to PDF with WebODF

    2011
    23
    Feb

    It is quite common that one wants to send ODF files to people that lack the software to display ODF. One workaround is to convert the ODF to PDF. Most office suites that support ODF can export to PDF. To compare how different office suites do this conversion one can use the website OfficeShots. This website offers the ability to perform this conversion in many office suites at once and to compare the results.

    WebODF wants to play with the grown-ups. So I have extended WebODF with the ability to convert from ODF to PDF. Here is a small script that shows how to do this conversion for a file /home/user/file.odt:

    # compile WebODF
    git clone http://git.gitorious.org/odfkit/webodf.git
    mkdir build
    cd build
    cmake ../webodf
    make
    cd ../webodf/webodf
    # perform a conversion
    FILE=/home/user/file.odt
    cp "$FILE" .
    FILE=`basename "$FILE"`
    ../../build/programs/qtjsruntime/qtjsruntime --export-pdf render.pdf "odf.html#$FILE"
    ls render.pdf
    
    oever's picture

    WebODF at FOSDEM

    2011
    17
    Feb

    The yearly FOSDEM was excellent as always. I could not attend all talks; mine was on sunday afternoon and as usual I was still improving it at the conference itself. Nevertheless, I spoke with many people and saw some very good presentations. Now that the videos are online, I will mention some of them with a link to the video footage.

    Why Political Liberty Depends on Software Freedom More Than Ever (video).

    Eben Moglen rallied the FOSDEM troops with early morning politics. He warns that we need not just Free Software, but also a Free Internet. If it is possible to turn off the internet, it is flawed.

    Calligra Under the Hood (video).

    Boudewijn Rempt gave a technical overview that shows that Calligra is a good starting point if you want to write your own custom office suite.

    Building a free, massively scalable cloud computing platform (video)

    Soren Hansen talked about the 'Apache of cloud solutions' OpenStack.

    Firefox 4: new features for users and developers (video)

    Tristan Nitot talked about the improvements in Firefox 4. It's nice to see them summarized in one talk. I particularly like the improvements in speed.

    Cloud 9 IDE (video) A development environment for JavaScript in the browser: awesome!.

    KDevelop: Rapid C++ Programming (video)
    For those not entirely in the cloud, KDevelop is a great IDE. Milian Wolff gave a nice overview of some of the coolest features.

    and of course the amazing talk on
    WebODF: an office suite built on browser technology (video)
    where I showed how to add WebODF to a website, how to write an Android application using WebODF and simply explained how it works. If this talk does not answer all your questions, come to #webodf on freenode, or post a question on forums.

    oever's picture

    WebODF at FOSDEM

    2011
    5
    Feb

    Currently I am enjoying FOSDEM, the excellen Free Software conference in Brussels. Tomorrow I will give a presentation "WebODF: an office suite built on browser technology" about WebODF. If you want a preview, you can look at a screencast about it.

    I'm going to FOSDEM

    Office suites for the cloud are becoming more popular. All of them are closed source and, worse, running on a server that is outside of the control of the user. A Free Software solution for this problems is urgently needed.

    WebODF is a library for adding OpenDocument Format (ODF) support to applications, regardless of whether they are running on the web or on the desktop. WebODF is a small JavaScript library that can display ODF documents in browsers and HTML widgets. Currently, simple editing support is being added. WebODF can be used in web applications and desktop applications.

    WebODF is extremely innovative because it is the first FOSS implementation of an office suite based on HTML5. Using HTML5 means that the code will run on nearly all modern computing systems. On top of that, it uses CSS in such a way that the ODF document is used nearly unaltered as the run-time presentation. This simplification allows us to develop fast and with little code.

    oever's picture

    JavaScript: keep it working in different runtimes

    2011
    3
    Jan

    The programming language JavaScript is seeing more and more use. Software written in it can run in many different environments. Not only do web browsers support it, there are quite a few programming environments that can integrate and run JavaScript code. Qt has support for it with the QtScript module. GNOME has JavaScript bindings via gjs. Node.JS is gaining popularity on the server and Java has the Rhino runtime.

    Support for the basic language features of JavaScript is good among these runtimes. You can have a look at the list of dialects of JavaScript/ECMAScript to see that "ECMA-262, edition 3" is the most common specification that is implemented. Nevertheless, each of these environments has different facilities for accessing parts of the environment they are running in. Modularizing the code, access to the file system, logging, starting a new execution thread, running unit tests, these are but a few of the use cases for which there is no common solution.

    There are few good practices that have helped me to keep my JavaScript code working in multiple runtimes. Most of the code for WebODF, an ODF project written in JavaScript, runs in the popular browsers, in QtScript, in Rhino and in Node.JS.

    Abstraction

    First of, I have written a small abstraction layer that wraps loading of modules, logging, unit testing and a few other things. This abstraction layer is not very large, it is a single file. The code contains an abstract class with implementations for the different runtimes. Whenever I need to access a runtime-specific funtion, I resort to this class, extending it where needed.

    JSLint and Closure Compiler

    JavaScript is a dynamic language, there is no compiler. This means that there are no steps required between writing the code and running the code. Code errors can easily slip in to released code. It is therefore very important to do static testing of the code. Two good tools for this are JSLint and the Closure Compiler. JSLint is a JavaScript program that analyzes code for correctness and style. Some features of the JavaScript language do more harm than good and JSLint brings occurrences of these to your attention so you can avoid them. The Closure Compiler can compile a collection of JavaScript files into one smaller file. But that is not why I use it. While 'compiling' the JavaScript, the Closure Compiler performs a number of checks on the code and catches certain problems before the code is actually run.

    Unit testing

    Running JavaScript on the command line, in a desktop program or on a website are very different. So sharing unit tests across these environments is a bit of work initially. Having good unit tests is invaluable though, so it is an investment you just have to make if you want to stay confident of your code. For WebODF, I have written a small script or web page for each environment in which I want to run the unit tests. So unit tests are written only once but tested in all environments where they are relevant.

    An amazing tool for checking how much of your code is covered by unit tests is jscoverage. It can 'instrument' your code. While running the instrumented code, reports are created that show how often each line of JavaScript was run. This makes it easy to find for what parts of your code could benefit most from an additional unit test.

    Conclusion

    JavaScript is nearly everywhere. But to write JavaScript that can go nearly everywhere too, you need to take portability into account. The best way to do that is to develop for at least three runtimes in parallel.

    oever's picture

    OdfKit Hack Week day 3

    2010
    25
    Jun
    It's Friday and day three of the OdfKit Hack Week. So what did we do all day besides folding balloons, talking to men in wooden shoes and eating pancakes? We actually implemented the style inheritance I blogged about yesterday. Background images are now supported too. There was some philosophizing over APIs and we published some code (recommended if are interested in (Qt)WebKit or ODF). Since the weekend is here we'll not go into details too much, after all you can download the Qt client code or try the online demo in a WebKit or Firefox browser, but we will show some images. The first image shows that background images are working in the Qt client now and the second screenshot shows the first part of the ODF 1.2 specification odt format opened in OpenOffice, our WebKit based viewer and KOffice.