To byte code, or NOT to byte code? Is this really the question?

Saturday, 14 August 2004 | Manyoso

It seems Trolltech is continuing to wrestle with the problem of pesky developers clamoring for bytecode based (read: higher level) languages. I talked with him quite a bit about this at the recent LinuxWorld in New York, so I know it is on his mind. In his recent interview at aKademy, Matthias even identified this as the "next big thing" for KDE. But, is bytecode really the question? Read on.

Q: What do you think the "next big thing" in KDE will be?

MATTHIAS: There is one thing that will become increasingly important in the future, not just for KDE, but for all of Linux: a convincing answer to Microsoft's .Net. I'm not concerned about the server, I'm concerned about the client...

Still it would be nice to take advantage of JIT-compiled bytecode where it makes sense, and have the two worlds interoperate. Currently there are two technical options: integrating Mono and the CLR, or going for a Java Virtual Machine. Mono at present has several advantages: First, there is no free JIT-compiling JVM that is equally actively developed and it doesn't look like there will be one. Second, cooperating with Miguel and the Ximian group at Novell is probably a lot easier than cooperating with Sun. And third, it is easier to integrate native C++ code with the CLR than going through the JNI.

To me, the real thing TT should be concerned about isn't so much nativecode VS bytecode... it is how to satisfy those pesky developers clamoring for higher level language access to Qt/KDE API's. I think this is what TT had in mind creating QSA, but that is only a scripting language. Ultimately, there are three niches of development I think TT needs to cater to:

A native systems programming language eg, C++. 'Nuff said.
A modern high level programming language eg, Java, C#.
Scripting support. TT does QSA and KDE does KJSEmbed.

Obviously, TT does a phenomenal job with the first one and the third seems adequately addressed. The big glaring hole is the mid-level option. I don't doubt that TT is feeling some pressure from customers interested in programming Qt against the .NET API's. Probably more so from the Windows world, but also from Linux via Mono/Portable.NET. I know TT responded with AWT bindings for java developers wishing to program to the qtopia mobile phone platform.

The existing options for the mid-level niche all royally suck, IMO. Let's take a look: pyqt, qtjava, Qt#...

First, they are all bindings and -- let's face it -- bindings suck. They suck for performance reasons, maintenance reasons and compatibility reasons. All the bindings require every virtual method in every library to be reimplemented. What's worse, they require an objects entire inheritance tree of virtual methods to be overridden, not just the ones declared by that object. Some require every method period to have a proxy C linkage. You have huge glue libraries, whether they be SMOKE, SWIG, QtC or libqtsharpglue.so. Mapping to and from the object heirarchies of the various languages can also be a pain. Q_PROPERTY VS C# style properties. Signals/Slots VS delegates/events. And then to add insult to injury, the various bindings seem like they are always behind. Second class citizens in a c/c++ dominated world.

Now, I do the Qt# stuff and I have a great time doing it. It is a fun project and we're making good progress. But, I won't kid myself into thinking this is a truly adequate solution to the problem. It is what works now, though, and I'll continue with it unless and until TT provides some direction towards moving to a truly useful mid-level niche. What might that direction look like? A few options that I can see:

Build a c++ compiler targetting the CLR according to the proposed ECMA standard.
Create a new high level language with transparent link compatibility with c++.
Target the LLVM project and extend the frontends so they can link to and from each other via LLVM mid-level IR.
TT could support and maintain one of the bindings themselves. Conditioned upon changes to Qt that would make the binding a first class citizen.

All of these have potentially significant problems and hurdles.

The first option has some potential legal/licensing questions and quite a number of technical hurdles. Managed c++ is quite different from the Standard c++ that we are all accustomed to. How would it work around moc and all its various goodies?? Q_PROPERTY, signals/slots etc, etc. Either way, this would require extensive modifications to the Qt library itself and probably couldn't even be considered until the Qt5 time frame.

The second option is one that I was persuing myself for a brief time earlier this year. I called the attempt 'k++' :) The idea was to build a java/c# like compiler that would have the binding 'built-in', so to speak. I was using a modified g++ frontend as the template and taking cues from gcj's CNI and Apple's Objective-C++ projects. Alas, mucking about with the internals of the gcc codebase was to much for me to stand. The entire compiler suite is using antiquated compiler technology and it's very difficult to modify without substantial changes. Still, I suppose TT could manage the resources -- if it had the will -- to pull off such a feat. The big drawback is that this is not a bytecode based solution. I'll get to that in a minute.

The third option is one I just thought about after seeing the recent slashdot article on LLVM. They seem to have a pretty intersting project. They already have extracted the g++ 3.4 frontend and retargetted it to produce their LLVM bytecode. They already have a jit. Now, if a java/c# like frontend were to come along and target LLVM bytecode then the remaining hurdle would be to provide a construct that the various frontends could use to parse the bytecode and provide syntax/linking information. Basically, recreating Microsoft's CLR idea by using the LLVM project as a jumping off point. I have no idea how feasible this is, but I just came across it and it seems interesting.

The fourth option is probably the easiest, but it is also the lousiest as far as technical solutions. If TT itself were to back and maintain one of the existing bindings projects... amd give it as much priority as the native c++ library, I suppose an adequate solution could be had. You'd still have large glue libraries and that'd make it less desirable for the embedded market. But, if it received the level of attention the regular c++ library gets...?? Of course, unlike the other solutions this wouldn't necessarily extend naturally to KDE API's.

As for the bytecode question, I see only a few reasons for its existance. In order of significance:

A basis for a Common Language Runtime. It's either this or you can build the binding into the compiler.
Sandboxing is the big one. As geiseri would say, "This should be done in hardware!", but it isn't.
"Write once, blah blah." I can get all the cross-platform I need through careful coding and the gcc compiler suite.
Finally, some theoretical performance optimizations over nativecode that no one (so far) has been successful in delivering.

I can't think of any other reasons. The first two are legitimate if you are looking for that kind of functionality. The last two I don't have any concern for.

Bytecode or not, I do think Matthias is correct in identifying these various issues as particularly prominent for the future of KDE/Qt. I don't know how or when TT will respond with some solutions or guidance... I don't know if they think it is worth the hassle... but, I'm very excited and curious to find out. In the meantime, I'll continue with Qt# so clee can write dotNET in .NET ;)

Next up? An update and a history for those who want to know what is going on with c# language bindings for KDE/Qt.