The wonderful new I mean old world of kdeinit, exmap and nvidia libGL
As some might have noticed among all the praise, some of the features may not come at low cost. One of the biggest memory hogs in KDE4 is (again) something that doesn't have much to do with KDE itself - the OpenGL library shipped with the nvidia driver. It is compiled without -fPIC to gain a couple percent performance increase (if at all, I personally doubt it makes a noticeable difference, but that's just guessing, given it's closed-source). And that means that every single application that links against it wastes about 11MiB RAM (on 32bit system), per process, regardless of whether and how much it actually uses it. And currently there are 5 such processes in just the plain KDE desktop, and count in the X server too. Do the math yourself. Or just have a look at the picture of Exmap showing it:
It's the 'effective mapped' column, see the howto for more details.
Speaking of Exmap, it sadly appears it is no longer maintained, which I find quite bad, because it's still the best tool I know for measuring memory usage. It still works though, after a little of patching, so I packaged it in the openSUSE buildservice (you want the xxx_Update repos, unless you've never been bothered enough to run online update). I unfortunately don't know how to make the package so that the buildservice would build it also for something else than openSUSE/SLE, but people who'd want to use it with other distros can either take the patches from it or find the right howto for multidistro packages somewhere at the openSUSE wiki. And, I should also note that Exmap seems to work only on 32bit, on 64bit machines it aborts due to some error that I haven't really looked at.
Back to the memory wasting issue. This is not the first time something outside of KDE made its memory usage look bad, and there is again the same solution - the proven kdeinit hack. It is still somewhat useful even on its own (0.7MiB RAM saved per process on 32bit, that's with a debug build without limited symbol visibility though, so I don't know how much it is in practice), but it can save much more when it comes to these workarounds. Kdeinit pulls in the library once, lets it mess with memory once as it wants, and then it is shared by all applications launched using kdeinit.
Well, that is at least the theory. It first needed fixing kdeinit, as that has never really worked in KDE4, broken by about 4 independent changes (and yours truly being guilty there too). Probably time to slowly start looking at performance again. By the way, just in case somebody would feel like measuring KDE4 memory usage and comparing the numbers with the previous benchmark, that is of course not the right way. Measuring both KDE3 and KDE4 on the same system might be interesting though (and I am not the one going to do that - it is not difficult, but I don't have the time).
For after making the theory of the workaround match the reality, see the picture. Now there are more processes "using" the library, but all those launched from kdeinit share the 11MiB waste just once (and 'sole mapped' is zero for them, unlike the other case where the waste is per process). There's not much to do with X, and also KWin is excluded from kdeinit, since there is the other __GL_YIELD=NOTHING nvidia hack for compositing and that clashes with this hack. Still, I thereby claim the achievement of reducing KDE4's memory usage for the plain desktop by about 33MiB (which was enough to fit whole plain KDE3 desktop according to the old benchmark), try to beat that!