Skip to content

Thought Experiment: XML integrated into a C-like language

Tuesday, 16 December 2003  |  tjansen

In recent days I made the following thought experiment: how can XML processing be made easier by integrating XML support into a Java/C#-like programming language.

I created the code snippet below to try out what such a language could look like. The syntax of this theoretical language:

  • Adds two hybrid-base types called Node and NodeList to the language. Hybrid means that they are Objects like java.lang.String in Java, but have their own literals and operators.
  • Node is similar to a DOM node, but uses the XPath data model (no DTD/Doctype, no entities, no CData sections, everything normalized)
  • Using the index ([]) operator an XPath expression can be executed on a Node, the result is a NodeList
  • A node has the operators += (add as a child), + (create a node list of the two nodes), -= (remove node from children) and << (replace the node)
  • A NodeList is a list of references to nodes. It has operators like +, += (append a node list) and << (replace all nodes)
  • A normal XML node literal is contained in [[ ]] brackets. To avoid uneccessary escaping, you can use more than two brackets, e.g. [[[[ <element/> ]]]].
  • A perl-string-like XML node expression that allows the insertion of base types is enclosed in single brackets [ ]. This would be a simple node with content: [ <text>Blabla ${somevariable} $anothervariable</text> ] . Variables can be Nodes, NodeLists, Strings, numbers..
  • You can cast any Node to NodeList. NodeLists can be casted to Node, but when the list has more than one member it throws an exception
  • Nodes can be implicitly casted to Strings
  • Strings can be implicitly casted to (text) nodes
  • the keyword prefix is used to define a XML namespace prefix to be used in XML node literals and XPath expressions. It can be used in all places you can declare a const variable, and has the same scoping rules

The example assumes that you are familar with XPath. Dont expect the code to be really useful, it's just to get a feel for the syntax. I think I could get used to something like this...

class Test {
        prefix ageext "urn:mascot-age-extension"; 

        static void main() {
                Node mascots = [[
        <mascotList>
                <mascot>
                        <name>Tux</name>
                        <species>Penguin</species>
                        <project>Linux</project>
                        <ageext:age>8</ageext:age>
                </mascot>
                <mascot>
                        <name>Konqi</name>
                        <species>Dragon</species>
                        <project>KDE</project>
                        <ageext:age>3</ageext:age>
                </mascot>
        </mascotList> 
]];

                workWithMascots(mascots, 4);
        }

        void workWithMascots(Node mascots, int mimimumAge) {
                mascots[/mascotList/mascot[ageext:age < $minimumAge]] << minimumAge;

                NodeList n = mascots[/mascotList/mascot];
                foreach Node i in n {
                        Node summary = 
[
<summary>${i[name]} is a ${i[species]} and the mascot of ${i[project]}</summary>
];
                        i += summary;
                }
        
                // print all mascots
                int num = 0;
                foreach Node i in n {
                        num++; 
                        Console.println([Mascot Number $num: ${i[summary]}]);
                }
        }
};