How to invent a hashing algoritm.

Wednesday, 24 January 2007 | Zander

I estimate its some 15 years ago I wrote my first address book. On my Amiga, in C. It had a really nice GUI that was task oriented and I kept all my friends addresses and phone numbers in there. One feature I was pretty proud about was that I encrypted the data before writing it to disk. Some months later I lost the key and I wrote another application to brute force the encryption and retrieve my address book. Lesson learned was that I should not have an 8 bit cypher as all possible keys can be tested for correctness in quite a short time. But instead of creating a new encryption or hashing function, I learned that its better to use an off-the-shelf algorithm as apparently its really really hard to write one that others who have access to the actual code of the algorithm can't hack. You may be wondering where I'm going with this, the answer is that whenever someone invents his or her own hashing or encryption algorithm I cringe, thinking that its such a beginners mistake that I place them in the same category as programmers that write for-loops in games with a specific amount of iterations to delay the program exactly 100msec. Something game writers used to do on old computers. Really funny effect when you bought a new computer that was twice as fast :)

Last weekend I was reading though the Ecma 376 standard. Which is the newest name for the >6000 line Office OpenXML specification. I was pretty surprised to find that the specification actually defines a hashing algorithm. Its a 7 page section that shows flowcharts, matrixes etc. used for hashing a password that can be used to protect cells so the user can't change them. Its actually a bit silly to have this in the spec. A password that disallows an application to alter cells which are still editable using a simple text editor. Or actually just any application that chooses to ignore the password. But thats not the point. A hashing algorithm converts a password into a unique number that can never be used to get back to the actual password. That is its goal. There are thus two ways to defeat a hash. One is to use a different input string to create the same hash. So you can generate a string that allows you to use instead of the real password. The second one is a bit more scary; if a hashing algorithm is flawed you can use it to get back to the original password typed. This second attack is scariest because its a known human trait to reuse one password for a lot of different usages. Meaning that if you crack one password its likely you can get into other systems under the access credentials of the same user. Think email or worse.

In the Ecma 376 standard there are actually 2 separate hashing algorithms defined where one is for spreadsheets and the other is for wordprocessing. I found that a bit funny; it shows how this spec is a description of how MSOffice was build over the many years. Including silly things like two different groups not communicating their changes to a hashing algorithm.

Coincidentally; SHA-1 is a hashing algorithm that has been created quite some years ago in a public process where a lot of mathmaticians have tried to break it over the years. Its done us good, but it wasn't good enough. Last week it was broken. People are adviced to switch to SHA-2 now. There are also efforts underway to create a new one, a public process that is expected to take at least 3 years.

The Ecma 376 specification is set to be ISO certified. Its on the fast track, and having a hash algorithm that is not created in public and quite old is a very good reason to not see it actually succeed into making an ISO standard. Consider what it would mean that such an algorithm which is almost certain to be flawed (much like my experiment was) gets the international stamp of approval. Not only must all applications that want to read Ecma 376 (and thus MSOffice docs) implement this broken scheme, it would also mean that people looking for a good hashing method should not use the ISO certified one. Which kind of goes against the goals of ISO. In fact, its illegal in various countries to use another standard when there is an ISO version for government applications.

This is just one item in a really long list of points that make the Ecma 376 standard unfit for fast track, and IMO even as a standard at all. For more issues; read http://www.grokdoc.net/index.php/EOOXML_objections