SEP
23
2009

Forgotten File Formats

While explaining the story behind his great ppttoxml tool, Jos also mentioned

Since about a year, Microsoft has, after significant political pressure, put documentation for their file formats on-line.

That's fine and solved some issues. But there are MS Access proprietary file formats (mdb, accdb) that remain to be secret. These are not planned to be replaced by XML formats (what would be overkill in databases). I guess there was no pressure to open the formats, what looks like an overlook in EU and the USA (correct me if there's another reason like patents). If you google for that, it is hard to find even a single mention of file format specifications in the above meaning, and even explanations from MS employees or backers show that they do not fully realize one thinf: MSA formats are not covered by the process of said "opening of the legacy formats".

MS Access formats are currently only accessible (I mean openly accessible, i.e. via 100% Free Software code, not via MS ADO dlls, used for instance by the oo.org plugin, what excludes non-Windows systems) through the mdbtools library project or its descendants like the Kexi's MDB plugin, that contains some improvements for mdbtools. The mdbtools project was a huge effort of reverse-engineering mdb formats (that are nasty direct binary dumps of some initialized and uninitialized chunks of memory owned by MSA, more strictly MS Jet db engine). The project now faces stagnation but it is possible to extend it unless MSA moves to entirely different storage format.

So unfortunately the MSA file formats are not quite fair play game in the small but important branch of desktop databases. While anti-competitive behaviour is a valuable corporate weapon, this neglected area contradicts the recent buzz about publishing various document format specifications. By still using the MSA file formats in 2009, you may not only have troubles with not owning your software written with MSA (forms, reports, code...), but also with not owning your data. That is why I emphasize importance of the issue, even if MSA is on its decline after we moved to the web era, and after MSA broke compatibility in 2003 version and also recently in 2007 and 2010 (but sure, it makes rather good consulting business, as any mess in IT).

To solve the issue once and in a definite way I'd like to hear any feedback from MS.

PS: On the Kexi side, due to our desire for releasing only stable software, we're in the way to KOffice 2.2 (i.e. 2010), so there will be no version 2.0 and 2.1. A lot of things have been ported to KDE 4, including forms, and there shall be more database and file formats handled, there are new features, e.g. reports.

Comments

I've messed about (quite) a bit concerning the standardisation of the ODF and OOXML in the past (and will do some more when time permits).

I'm not really sure if MSA is being referred to in the ISO/IEC IS 29500 (a.k.a. OOXML, OfficeOpen XML, "that blasted MS Office XML format"), but if it is, the solution could be the same as it was for the binary MS Office document formats.

Namely, for a standard to be approved by ISO (and it is), it needs not only to produce a full specification, but also make available *all* the specs needed to implement the (proposed) standard - even if they are only referred to in the standard's text.

If therefore there's MSA is being referred to in the ISO/IEC IS 29500 text, the safest way to get its specs would probably be to put pressure on ISO and/or IEC as to why the MSA specs are not available and ISO/IEC should then put pressure on Microsoft to make them available. ...essentially the same method that caused the binary MS Office format specs to be made public (to some extent at least).

I hope this helps you out.


By hook at Wed, 09/23/2009 - 11:21

To my knowledge MSA isn't explicitly referred in the OOXML documents, neither in the original MSOOXML documents. What inexperienced users may see is that just various MSO apps provide import/export/preview functionality related to MSA, and conversely, what's some kind of integration. For example "Export data to Excel" in MSA. It's implementation thing however, and even if mentioned in any specs, it is marked optional, so there's nothing except good taste to force the authors to provide specifications for the referred formats or protocols.

Being involved quite a bit into analysis of the bitterness of MSOOXML, I usually say that the authors have turned fair amount of optional features (e.g. OLE) to assure lock-in factor in their suite: you don't have to be compliant with the specs but then, OOXML compliance means nothing for anyone but marketoids. Users caring for MSO compliance, expect MSOOXML compliance.


By Jarosław Staniek at Wed, 09/23/2009 - 13:52

If that's the case - and I believe you have a better insight then me on that matter - I don't see many other options then peer pressure or hell freezing over. And both are unlikely to be big enough to change MS's heart over this topic.

Already when MS (or rather Ecma) proposed "OOXML" to ISO/IEC, I was seeing vendor lock-in all over it. We both know that OOXML was made only to get equal chances on tenders as office suites that actually support open standards.

I also suspected that they'll implement both ODF and their own OOXML in a way that will necessarily cause incompatibility with other suites and/or make it quite awkward to use.

Meh, it's quite sad (and annoying) that people still use such rubbish formats and the quality of any other office suite is measured with how good its MSO import/export filters are.

If I can think of something regarding MSA, I'll let you know...


By hook at Wed, 09/23/2009 - 14:09

"PS: On the Kexi side, due to our desire for releasing only stable software, we're in the way to KOffice 2.2 (i.e. 2010), so there will be no version 2.0 and 2.1."

Is there really no way you guys will change your minds on this? Me as an every-day KDE 4 user, and previously also a happy kexi user, I would gladly use a (potentially unstable) alpha release of kexi2 rather than have no kexi at all.
And none at all it is, with the way the distributions handle the current situaion... In Ubuntu for example, you can't have kexi and any koffice 2 app installed side by side, because of conflicting koffice-libs packages. In Arch Linux, they don't seem to ship kexi at all, only those apps that are part of the current (=2.0.2-2) koffice package.

So why not give us users the choice of using (or testing, that is) kexi as part of koffice 2.1... You could always show a splash screen on kexi startup warning about the fact that it's alpha, unstable, incomplete, and whatever... that way, there will be no bad surprises and disappointments...

(It's not like any of the other KOffice apps were completely ready for production use at the point of the 2.0.0 release... Still it was good that the release what made.)


By smls at Thu, 09/24/2009 - 10:15