Skip to content

10 Things I Hate About XML

Saturday, 27 December 2003  |  tjansen

  1. DTDs and everything in the <!DOCTYPE> tag is horrible. The syntax is cryptic, the allowed types are odd and the degree of complexity is very high (parameter entity references!). RelaxNG and even XML Schema are much better solutions, and the XML specification could be reduced by at least 75%.
  2. Entity references are not needed in a Unicode world (exceptions: the predefined entities and character references).
  3. Processing instructions are an odd and unstructured mechanism for meta-data about the XML and should not be needed anymore, because namespace'd elements and attributes could achieve the same.
  4. CData sections may be somewhat useful when writing code by hand, but that does not compensate for the complexity that they add to document trees - without them there would be only one type of text.
  5. Different char sets. There's no real need to allow different charsets in XML, it just hurts interoperability. It should be at least restricted to the three UTF encodings, maybe even only one of them. Allowing charsets like 'latin1' is useless if processors are not required to support them.
  6. The lack of rules for whitespace handling. Actually there would be a very simple and sane rule for whitespace handling (always return whitespace unless a element contains only elements and does not have xml:space="preserved" set), but the specs require the XML processor to return even the useless whitespace.
  7. The XML specification should set up rules that specify how to handle namespace'd elements and attributes that are not supported by the application. Right now the schema defines how to handle them and the application will not get any support by the XML processor. Ideally the application should tell the XML parser which namespaces it supports, and the XML specification should define what the XML parser does with the rest.
  8. xml:lang is pretty useless without more rules for the XML processor. It would make sense if the XML parser could somehow only deliver text in the desired language to the application, but without any useful function it just bloats the specification.
  9. XML Namespaces are probably the greatest invention in XML history, but they should be in the core specification. Otherwise the APIs are splitted into namespace-aware functions and those that ignore them. The main problem is that the ':' character has no special meaning in the core specification, so you can have well-formed XML with undefined prefixes, several colons in a single name and so on...
  10. XML Schema should be deprecated in favour of RelaxNG. I haven't seen a single person who would claim that XML Schema is better. People just use it because of the W3C label.