Skip to content

KCabinet - mostly working

Friday, 4 January 2008  |  brad hards

I finally got the KCabinet class (in playground/libs/kcabinet) working.

To make any sense out of this, you need to know that the Microsoft Cabinet format is based on blocks of data (the CFDATA block) that end up being <= 32768 bytes. There are four ways that the data is packaged - uncompressed, MS-ZIP format, Quantum and LZX. I'm mainly interested in MS-ZIP, which is basically the deflate algorithm with a little header (similar to how gzip works). With MS-ZIP, you get around 2K of compressed data in each block, which expands to 32768 bytes in each block - except the last one of course.

So what I was doing was using zlib to inflate() each block. The problem is that when a single file extends over more than one block, zlib wouldn't actually decode anything beyond the first block - it just returns Z_STREAM_END, which makes sense (since each block is complete), but didn't ever write beyond the first of uncompressed output (i.e. 32768 bytes of good output, followed by whatever I initialised the QByteArray to.

I tried lots of things. I switched to using zlib directly (instead of through KFilterDev), I tried various options to the zlib functions. I pulled what is left of my hair out. I considered importing the libmspack library, or a modified copy (e.g. from Samba or ClamAV).

In the end, I asked the zlib authors. Mark Adler (possibly not an up-to-date site) came back overnight with the right answer (despite never having tried it): the previous 32K block as a dictionary. You can use inflateSetDictionary() to set the dictionary for the next block before decompressing.

Magic!

Essentially the sequence that works for me is:

inflateInit2( streamState, -MAX_WBITS );
for each block:
        parse the header (including the CK prefix on the data);
        read the compressed data, and add that the streamState;
        inflate( streamState, Z_SYNC_FLUSH );
        copy the decompressed data to the output buffer;
        inflateReset( streamState );
        inflateSetDictionary( streamState, decompressedData, decompressedDataSize );
inflateEnd( streamState );

I never would have got that without the assist.

We still might need mspack (if we ever need LZX or Quantum), but zlib is already a dependency for KDE, and it works for me :-)

Next step is to follow Aaron's advice, and refactor the code to make sure it is easy to understand and robust. I am beyond the experimentation stage now, and I've got the unit tests to make sure it doesn't go bad.