Why does Linux need defragmenting?

Saturday, 19 August 2006 | Lubos Lunak

This so often repeated myth is getting so old and so boring. And untrue. Linux doesn't need defragmenting, because its filesystem handling is not so stupid like when using several decades old FAT. Yadda yadda, blah blah. Now, the real question is: If Linux really doesn't need defragmenting, why does Windows boot faster and why does second startup of KDE need only roughly one quarter of time the first startup needs?

Ok, first of all, talking about defragmenting is actually wrong. Defragmenting is making sure no file is fragmented, i.e. that every file is just one contiguous area of the disk. But do you know any today's application that reads just one file? The thing that should be talked instead should be linearizing, i.e. making sure that related files (not one, files) are one contiguous area of the disk.

Just in case you don't know, let me tell you one thing about your the thing busily spinning in your computer: It's very likely it can without trouble read 50M or more in a second - if it's a block read of contiguous data. However, as soon as it actually has to seek in order to access data scattered in various areas of the disk, the reading performance can suddenly plummet seriously - only bloody fast drives today have average seek time smaller than 10ms, and your drive is very likely not one of them. Now do the maths, how many times do 10ms (or more) fit in one second? Right, at most 100 times. So your drive can on average read at most 100 files a second, and that's actually ignoring the fact that reading a file usually means more than just a single seek (on the other hand that's also ignoring the drive's built-in cache that can avoid some seeks). Some of the pictures explaining how Linux doesn't need defragmentation actually nicely demonstrate that with files scattered so much the disk simply has to seek.

Now, again, how many files does an average application open during startup? One? It's actually hundreds, usually, at least. And since Linux kernel (AFAIK) at the present time has next to none support for linear reading of several files, you can guess what happens. Indeed, kernel developers will undoubtedly tell you that it's the applications' fault and that they shouldn't be using so many files, but then kernel developers often have funny ideas about how userspace should work and seriously, why do we have filesystems if they're not to be used and applications should compress all their data into a single file? For people who don't know about this (and most don't, actually) it feels kind of natural to structure data into files.

Nothing is perfect and just blaming kernel developers for this wouldn't be quite fair, but then it sometimes can really upset me when I see people "fixing" problems by claiming they don't exist. I am a KDE developer, not a kernel developer, so it may very well be that some of what I've written above is wrong, but the single fact that the problem exist can be easily be proved even by you:

Boot your computer, log into KDE, wait for the login to finish. Log out. Log in again. Even if you use a recent distribution that may use some kind of a preload technique that reduces this problem, there should be still a visible difference. And the only difference is that the second time almost everything is read from kernel's disk caches instead of the disk itself. Which avoids reading of the data and which avoids seeking. And the difference is the seeking, not the reading of the data: KDE during startup should be very unlikely to read more than 100M of data and that's 2 seconds with 50M/s disks - is the difference really only 2 seconds for you? I don't think so.

So, who still believes this myth that everything in the land of Linux filesystems is nice and perfect? Fortunately, some kernel developers have started investigating this problem and possible solutions.