On benchmarks

Wednesday, 10 March 2010 | Lubos Lunak

Do you know this one?

Phoronix tested md5sums of ISO images of distributions. The winner was openSUSE, scoring e29311f6f1bf1af907f9ef9f44b8328b, which gave it a noticeable lead before second Slackware (b026324c6904b2a9cb4b88d6d61c81d1), which is quite closely followed by Fedora (9ffbf43126e33be52cd2bf7e01d627f9) and Debian (9ae0ea9e3c9c6e1b9b6252c8395efdc1). The difference between these two distributions, as you can see, is only very small. Ubuntu completely flopped in this test, achieving only 1dcca23355272056f04fe8bf20edfce0, which is surprising, especially considering that its previous release scored a very nice c30f7472766d25af1dc80b3ffc9a58c7. (source).

Ok, that's just a joke, but the sad part is, as somebody pointed out, that it wouldn't be really that surprising if Phoronix actually did something like that. And, probably even more sad, there would be people who'd really take it as if it meant something and started adding comments about how last openSUSE is pretty good, last Ubuntu is so disappointing, and Fedora and Debian are not really that different.

So take this from somebody who has already done a lot of performance work: Benchmarks, on their own, mean almost nothing if you don't understand them. Especially if they are seriously flawed (I mean, testing filesystem performance by doing CPU-intensive tasks? Hallo? Probably even FAT16 could provide the same results in those tests on an SSD.), but even if the results are useful numbers, it is still necessary to understand what the numbers actually say. I think I wouldn't even have a big problem forging a "benchmark" where KDE would get better (and correct) numbers than LXDE by finding a scenario that'd be twisted enough.

And even then, it is still necessary to keep in mind what is compared. Comparing the default setup of Fedora and openSUSE means also comparing GNOME and KDE, which you may or may not want, but if you compare the distributions this way, it is affected by differences between the desktops, and if you compare the desktops, it is affected by the differences between the distributions. And in either case, it may or may not apply to another distribution or another desktop.

Yet one more thing to understand is what is measured and how it affects performance as a whole. Ages ago there was a Dot article that also mentioned performance improvements to be brought by Qt4 in some specific areas, yet even now there are numbers of people seriously disappointed by KDE4's performance only because they thought that since Qt4 improves in some areas, KDE4 will get exactly the same improvement, regardless of how much these improvements matter for the whole of KDE4 or how much of KDE4 was rewritten when porting from KDE3. When I fixed Exmap to work again and did a little benchmark as a part of it, there wasn't really much more to it than to show that Exmap works again and that it is very easy to lose a lot of advantage by a simple mistake, yet people had no problems drawing all kinds of strange conclusions from that. Since making right conclusions is unexpectedly difficult for most people when it gets to benchmarks, I really need to remember not to just post numbers again without also saying what it means.

And, finally, there is still the question of other costs and whether it is worth it. Various KDE components often have resource demands, but then they are also often worth it. I mean, we all could still use TWM, or, heck, Windows 95, but seriously, how many of us really would? This, ultimately, is always about what works the best for you.