Today two Sun employees, Michal Pryc and Steven Xusheng Hou, published a comparison of four desktop indexers: Beagle, JIndex, Tracker and of course Strigi. The work is really extensive and is meant for Sun internally as well as feedback to the developers of the software.
The document is good news for Strigi. The study shows that it uses the smallest amount of RAM (but Tracker uses just as little if you consider the error margin, the other two used at least 15 times as much RAM) and that it is way faster than the rest. Please look up Table 5 in the document.
Here I reproduce its contents:
Beagle JIndex Tracker Strigi Number/size of TXT files 10 000 / 168MB 10 000 / 168MB 10 000 / 168MB 10 000 / 168MB Size of the index database 62MB 93MB 140MB 119MB Time of indexing [hr:min:sec] 02:18:05 03:02:55 03:03:14 00:04:26 CPU TIME [hr:min:sec] 00:12:05 00:09:15 02:22:40 00:03:44 Average CPU usage 8.79% 5% 77.73% 82.75%
Why is Strigi so fast? Two reasons: first, it does not artificially slow down but runs in the background and lets the Linux kernel decide when it can run. Because the indexer of Strigi has the lowest possible CPU priority, so the user does not notice Strigi working. This is why it is 30x as fast as Beagle and 40x as fast as Tracker. And the total amount of CPU used is also 2.5 as little as the number two, JIndex.
Second, the way Strigi extracts data is simply very efficient. And the good news is that the code that does this is available as a library under LGPL. So the other search engines have no excuses for being so much slower. They too can be lightning fast and I encourage them to apply the grease called libstreamindexer.
This awesome speed is very nice and bodes very well for KFileMetaInfo, the KDE class that provides metadata about files, since I'm currently working on letting it use Strigi as the source for the metadata.
I want to give a thanks to Michal and Steven for this great comparison!