ran out of steam

Wednesday, 21 April 2004 | Dkite

The cvs-digest rewrite seems to have hit a wall, or I have. I suppose a couple of days away will help.

It looks like I'll be hosting the thing on my machine for the time being, until someone with a repository offers a bit of space and bandwidth. I'm short on hd space, got another one coming tomorrow. 120GB seagate. Should do for a week or two. Then do a chroot jail for apache and php, and get the thing running. How come this project, which I started to learn the kde api, ends up teaching me web server admin, php, perl and a host of other interesting but unrelated stuff? The security stuff makes me nervous. If anyone is willing to help with a security audit of the php code, and some suggestions on securing the server, please let me know.

Does anyone know of a dynamic dns service that is good?

A few people have requested information on the tools I use to produce the digest. Up till now they are very specific to KDE. Specifically KMail and the kde-cvs list. I doubt if they would be portable to any other project. The new version works with a native cvs repository. A perl script builds an index of the atomic commits in the repository, the index is searched (grep) for either branch, username, file and version, and time. It gets the files and other info, cvslog's from the repository for the comments, and builds the commit list. There is some things specific to KDE, such as the bug references, but most other things would work for any repository. The challenge with KDE is the size of the repository, and keeping things reasonably quick. Cacheing will help, but it's too early for that. Native cvs is over 10gb. And growing. The index build for about half the repository took 87 minutes. The statistic generator took 27 minutes. That is most of the modules except kde-i18n.

With this basic function, one can search the repository by developer, project, branch. A list similar in format to the digest commit list is produced, with bug references and links.

To view the code changes, a diff is generated for each file within the atomic commit. You can view the graphical objects such as backgrounds and icons. The next enhancement is to add the ability to listen to the various binary sound objects in the repository.

To create the digest, a file with a "filename, version, type and category" list is read and parsed, creating the commit list with all nicely tucked under the various headers. The table of contents is created using the same data. It actually works. I have a script that creates the commit list from the kde-cvs mails, but it could be created by hand with much patience. I created a php template class that can build data trees such as the digest commit list. The list, a statistics xml, commentary file and summary file are required for each issue.

The last thing I worked on is a statistic generator. The statistics that you saw for the last months were generated from the kde-cvs emails, which is a pretty ugly hack. The new perl script parses the cvs repository, counts the commits by module and user, counts the lines, and spits out an xml file. A php script then reads the data, and displays it. Php xml handling sucks. I need to write the routines to generate the abridged statistics. Now the page is all the numbers for the time period specified. I haven't had a stable repository here, no room and all, so any numbers have been unreliable. This script could be used by other projects as well.

So, things are progressing. There is still much to be done even before doing the regular digest with it, but the end is near. I'm sure once everyone is reading it weekly there will be many more suggestions of enhancements.

Give me a couple of days to post some links. I hesitate to do it right now since the server will be rebuilt this week.

Derek (who feels better already)