Thought Experiment: XML integrated into a C-like language

Tuesday, 16 December 2003 | Tjansen

In recent days I made the following thought experiment: how can XML processing be made easier by integrating XML support into a Java/C#-like programming language.

I created the code snippet below to try out what such a language could look like. The syntax of this theoretical language:

Adds two hybrid-base types called Node and NodeList to the language. Hybrid means that they are Objects like java.lang.String in Java, but have their own literals and operators.
Node is similar to a DOM node, but uses the XPath data model (no DTD/Doctype, no entities, no CData sections, everything normalized)
Using the index ([]) operator an XPath expression can be executed on a Node, the result is a NodeList
A node has the operators += (add as a child), + (create a node list of the two nodes), -= (remove node from children) and << (replace the node)
A NodeList is a list of references to nodes. It has operators like +, += (append a node list) and << (replace all nodes)
A normal XML node literal is contained in [[ ]] brackets. To avoid uneccessary escaping, you can use more than two brackets, e.g. [[[[ <element/> ]]]].
A perl-string-like XML node expression that allows the insertion of base types is enclosed in single brackets [ ]. This would be a simple node with content: [ <text>Blabla ${somevariable} $anothervariable</text> ] . Variables can be Nodes, NodeLists, Strings, numbers..
You can cast any Node to NodeList. NodeLists can be casted to Node, but when the list has more than one member it throws an exception
Nodes can be implicitly casted to Strings
Strings can be implicitly casted to (text) nodes
the keyword prefix is used to define a XML namespace prefix to be used in XML node literals and XPath expressions. It can be used in all places you can declare a const variable, and has the same scoping rules

The example assumes that you are familar with XPath. Dont expect the code to be really useful, it's just to get a feel for the syntax. I think I could get used to something like this...

class Test {
        prefix ageext "urn:mascot-age-extension"; 

        static void main() {
                Node mascots = [[
        <mascotList>
                <mascot>
                        <name>Tux</name>
                        <species>Penguin</species>
                        <project>Linux</project>
                        <ageext:age>8</ageext:age>
                </mascot>
                <mascot>
                        <name>Konqi</name>
                        <species>Dragon</species>
                        <project>KDE</project>
                        <ageext:age>3</ageext:age>
                </mascot>
        </mascotList> 
]];

                workWithMascots(mascots, 4);
        }

        void workWithMascots(Node mascots, int mimimumAge) {
                mascots[/mascotList/mascot[ageext:age < $minimumAge]] << minimumAge;

                NodeList n = mascots[/mascotList/mascot];
                foreach Node i in n {
                        Node summary = 
[
<summary>${i[name]} is a ${i[species]} and the mascot of ${i[project]}</summary>
];
                        i += summary;
                }
        
                // print all mascots
                int num = 0;
                foreach Node i in n {
                        num++; 
                        Console.println([Mascot Number $num: ${i[summary]}]);
                }
        }
};