Skip navigation.

Syndicate

Syndicate content

User login

xmldocument

Annoyances in .NET XML libraries

At work I’m building a simple tool to populate a FogBugz wiki page with build information. One of the things this tool needs to do is pull the XHTML contents of a wiki page, parse it (as XML), and take action on the resulting document tree. Initially I expected this to be stupid-easy, as XHTML is just XML, right?

Au contrare!

Problem 1: XHTML is NOT just XML

The first problem is XHTML documents likely contain entity references like   and whatnot. These entity references aren’t XML entities, they’re XHTML entities, so you must load the XHTML DTD in order to resolve them. Trouble is, this means there must be a proper XHTML DOCTYPE directive in your XHTML (which there isn’t in my case since I’m using fragments).

Once a valid DOCTYPE directive is added to the XHTML, now .NET will download the full DTD from W3 just to parse a little XHTML fragment. Not acceptable.

Syndicate content