Annoyances in .NET XML libraries
At work I’m building a simple tool to populate a FogBugz wiki page with build information. One of the things this tool needs to do is pull the XHTML contents of a wiki page, parse it (as XML), and take action on the resulting document tree. Initially I expected this to be stupid-easy, as XHTML is just XML, right?
Au contrare!
Problem 1: XHTML is NOT just XML
The first problem is XHTML documents likely contain entity references like and whatnot. These entity references aren’t XML entities, they’re XHTML entities, so you must load the XHTML DTD in order to resolve them. Trouble is, this means there must be a proper XHTML DOCTYPE directive in your XHTML (which there isn’t in my case since I’m using fragments).
Once a valid DOCTYPE directive is added to the XHTML, now .NET will download the full DTD from W3 just to parse a little XHTML fragment. Not acceptable. So, I had to download the XHTML strict DTD, and the three dependent DTDs containing entity definitions, and paste them together to create one big DTD with all the XHTML entities inline. I then added that file to my project as an embedded resource, and ripped off this code to write an HtmlResolver subclass of XmlUriResolver to intercept requests for the XHTML DTDs, and instead pull my jumbo DTD out of a resource stream and return that.
Problem 2: Automatic XHTML namespace
The second problem cropped up when I tried to issue an XPath query for all the h1 elements in my markup. For some reason, the call to SelectNodes was always returning zero matches, even though I know the XHTML contained a multitude of h1 elements. The cause became clear when I looked at the InnerXml property of my XmlDocument object. Something was adding a xmlns:namespace attribute to the html element I was generating for the root of my document tree, even though the string from which the XmlDocument was generated never explicitly specified a namespace. Since I was issuing an XPath query for h1 elements in the global namespace, and the parser put all the elements in the html namespace, it was silently skipping all my h1 elements!
The fix was to replace html as my root node with something not in the XHTML DTD, like wikipage. Still, that should not happen!
Idea for .NET 4.0: XHTML support, built in!
.NET framework source code? Really!?
I just read Scott Guthrie’s announcement that, starting with Visual Studio 2008, the full, commented source code to the major assemblies in the .NET framework will be available for debugging and analysis, both as a source tarball and on-demand via the public symbol server.
I can’t tell you how many times I’ve been using a System assembly which is misbehaving, and been forced to use Lutz Roeder’s Reflector tool to reverse-engineer to uncommented C# code in the hopes of figuring out that is wrong. It was equivalent to the days of old when the C Runtime Library source code wasn’t shipped with the compiler.
All I can say is, “what took so long!?”
UPDATE I’ve read some bitching about the nature of the license under which this code is to be released; specifically, some Microsoft license which is no-modify and no-distribute. Those who find this unacceptable are, imho, missing the point. 80% of the value of a unrestricted GPL-type .NET Framework source release would be easier diagnosis and debugging anyway, since even if you could fork the MS framework, that would be like modifying the MFC code that shipped with VC 4 and shipping the modified DLLs with your product; you could do it, but you’d have to be a fucking moron.
Goddamit DevExpress!
Earlier I lamented my fate as a GUI developer with DevExpress GUI widgets. After a few hours of repeated head/wall interfacing, I found the problem, and it’s so maddening I must now vent.
Most of my business objects are displayed in the GUI using WinForms and DevExpress data binding. If I have a list of objects to display in a grid or list or, in this case, a calendar, I use lists derived from System.ComponentModel.BindingList<T>, which implements all the plumbing, like IBindingList, plus general-purpose typed container stuff.
Using one of these lists as the DataSource for WinForms or DevExpress controls Just Works. However, as mentioned before, the scheduler control doesn’t switch to the UI thread when handling ListChanged events, so when I load my list in a background thread it crashes the app. Rather than post to the support forums and wait until the next dot release, I just whipped up BraindeadBindableListWrapper, which implements IBindingList and delegates all but the ListChanged event to the IBindingList instance it wraps. For the ListChanged event, it intercepts the event from the wrapped list, switches to the UI thread, then passes the event on to whoever’s listening (typically the Scheduler control).
However, when I use this wrapper, none of my events show up. It wasn’t until a few hours into the problem that I realized the wrapper was the cause; DevExpress doesn’t throw an exception, or issue an error; it just silently ignores the properties on my event objects, and creates one subjectless, 1 Jan 01 event for each event I have.
Out of desperation, after trying a multitude of other stupid-but-doesn’t-hurt-to-try hacks, I added an ITypedList implementation to BraindeadBindableListWrapper, even though my BindingList<T>-derived lists that work fine don’t implement that interface. Shockingly, it worked fine.
I then began spelunking around the DevExpress source code (the source license for DevExpress is really the only thing between me and a psychotic episode), and found the spaghetti-like logic that teases out properties for list items.
The problem, ultimately, is that the DevExpress team aren’t aware of/don’t shit a shit about the principle of Least Surprise. Under this principal, framework programmers make design decisions with the intention of limiting the surprise experienced by the programmers using the framework. For example, a method that saves a file which conforms to the Principal of Least Surprise would simply save the file, while a non-conformant method might validate or flush buffers or do some other side-effect-having bit as well.
In DevExpress’s case, they want their data binding code to tease out the properties of the items in a list under any possible circumstances. This leads to a fairly arcane bit of heuristics, which include:
- Checking for an
ITypedListimpl, and using that if found - Checking for one or more items in the list, and using the first item if found
- Checking for an indexer property (
this[]in C#), and using the public properties of the return type of the indexer if found
That feeling of dull nausea in your stomach is a normal response to this stimuli.
This has a lovely side-effect: any generic List<T> or BindingList<T> list will Just Work, since the generic containers happen to provide a typed indexer. Similarly, any ITypedList impl will work. What’s more, an IBindingList that has at least one item in it will also work, since DevEx will deduce the properties from the first item it finds.
However, in a fit of maximal surprise, an empty IBindingList that is populated after binding will not work, since the heuristic doesn’t have any information with which to determine the properties of the items in the list at binding time.
So, here we have two shitty things:
- Binding logic that uses a non-obvious, non-documented heuristic that at first blush appears to work/not-work with two equivalent implementations
- Binding logic that isn’t smart enough to evaluate properties on an item-by-item basis
So, if my BraindeadBindableListWrapper either implements ITypedList or has a typed accessor, it’ll Just Work. A violation of the principle of Least Surprise if ever there was one. DevEx, I want my Saturday night back!
Using Developer Express UI widgets
Lately I’ve been stuck doing GUI development at work. If you’re not a systems programmer, it’s probably hard for you to understand how degrading UI work is for me, but it’s roughly equivalent to a lawyer cleaning toilets, or a doctor doing TPS reports.
To make this process suck minimally, my company licensed the Developer Express DXperience toolkit. The toolkit includes WinForms, ASP.NET, and ActiveX components, but we’re only using WinForms.
In my tenure at BearingPoint I’d used a heavily customized earlier version of DevExpress’s XtraGrid control, but they’ve come along way since then, and I’m using more controls now.
In general, my impression of GUI development is that it sucks as much as any other GUI development task I’ve undertaken. The toolkits are always more complex than you’d like, and at the same time less flexible than they should be. Threading issues are the rule rather than the exception, and nothing works quite as it should.
Thus far I’ve posted nearly a dozen support issues to the DevExpress support system. One is a confirmed bug which remains outstanding, a few were basically RTFM responses pointing me to the solution I was too stupid/lazy/non-psychic to find myself, and most were along these lines:
Thank you for your question. Though what you’re asking is a basic bit of functionality which one could reasonably expect from any commercial toolkit worth its salt, there’s no easy way to do it with the current version. We’ll add your request to the ‘No Fucking Way’ feature queue. In the mean time, here’s a nasty hack that uses reflection and write-only VB code to achieve sort of what you want, but it will only work in the attached test harness, and not in the real world. Thanks for using DevExpress. Sucker.
I’m not ragging on DevExpress specifically. There’s just something about modern UI programming that is intrinsically hard, not in the ‘NP hard’ sense, but in the ‘factor this 100-bit binary polynomial in your head while I slap you repeatedly with this dead fish’ sense; that is to say, complex, hard to hold in your head, and rife with concentration-breaking distractions.
Case in point: right now I’m trying to get some temporal data from our product into a calendar-type view, using DevExpress’s XtraScheduler control (yes, they all start with ‘Xtra’; it’s called ‘branding’). Ostensibly, this is as easy as building an IBindingList-drived list of objects with certain properties like Start and End. Indeed, I whipped up a working prototype using dummy objects in mere seconds.
And yet, as with so many things, once I moved it to the real app, it stopped working. I add objects to the list, the list ListChanged event fires (filtered through a wrapper to fire the event on the UI thread, which I had to write because XtraScheduler is too stupid/lazy to check the thread a change notice comes in on, but I digress), and the control’s count of schedule items increments, but it never touches the Start/End properties, and just builds up a list of events starting 1 January AD 0, (incidentally, wasn’t that the first Christmas day?), without displaying fuck all.
I could post something in the DevExpress forum, but since my test app works fine, I’d have to come up with another test app to reproduce the problem, and if I knew the problem well enough to reproduce it, I’d probably be able to solve it on my own. If I try to describe the problem w/o a sample project, they’ll not understand me and ask for a sample project.
So, here I am, sifting though the XtraScheduler innards on a Saturday night, when all the cool programmers are debugging kernel modules or attaching TCP/IP stacks to robotic litter boxes. sigh
.NET File APIs Don't Like '\\?\' Prefix!?
I’ve recently run into a situation in which I need to use .NET file I/O APIs like File.Exists and File.OpenRead using a filename of the form:
\?\Volume{guid}\foo.txt
Where guid is a GUID in the standard registry format. Assuming the path refers to a currently available volume (as reported by, for example, mountvol), this is a fully valid path. Try this experiment:
Go to a command prompt, and type mountvol:
The output you’ll get, after usage information, will be the list of current mount points for each mounted volume on your system (in other words, drive letters):
\?\Volume{88370110-213a-11db-9fa6-806e6f6e6963}\
C:\
\?\Volume{9ab4d54b-2171-11db-a24c-005056c00008}\
G:\
\?\Volume{6f7fdfc2-213b-11db-9f17-806e6f6e6963}\
D:\
\?\Volume{2d206468-216c-11db-9f1d-505054503030}\
E:\
\?\Volume{9ab4d552-2171-11db-a24c-005056c00008}\
F:\
This gives you the volume name corresponding to each local disk drive letter on your system. Now, say you have a file on C:\, C:\Test.txt. Run this command:
notepad \?\Volume{88370110-213a-11db-9fa6-806e6f6e6963}\test.txt
Where the GUID junk there is the GUID assigned to your C: drive. If you did it right, notepad will open the file, just as though it was on C:. Note that Explorer doesn’t work with these paths, and neither do the File Open common dialogs. But, if you pass a path like this to CreateFile, as notepad does when you pass a filename on the command line, it will work just like any other file.
However, pull that stunt with a .NET file API, and you’ll get a nice little exception to the tune of Illegal characters in path. What invalid character? Well, for one, the ‘?’ is frowned upon; I don’t think .NET supports the path parsing disabled form of paths at all. I was lucky in that all I needed to do was test for the existence of a specific file with a volume path, so I just did a P/Invoke call to GetFileAttributes. If I’d wanted to read or enumerate such files, I’d be screwed to the wall.
Yes, yes, C/C++ pugilists, rub it in. Basque in this rare instance in which your Win32 API elitism and garbage collector condescension are in fact vindicated. Take a few minutes and savor it; in that time I’ll construct a scalable three-tier web app with authentication, membership, and skinning using a few lines of ASP.NET code, and then you can return to the back of the bus.
WTF is wrong with WPF focus!?
There’s something badly broken about Windows Presentation Foundation focus. I’ve read that WPF implements its own concept of focus, and at the high level a WPF element tree is just a single HWND, and the WPF elements are rendered therein.
Here’s an experiment for you to try on your own: Create a simple XAML Window. Populate it with user controls. Now, try to get the first user control in the window to receive focus (keyboard and logical) when you’re window starts up. The FocusSample does it with a Button, which seems to work OK, but with a UserControl you’re out of luck. Even after setting Focusable to true, it doesn’t work. Yet, you can Tab through the focusable controls without problem.
First, you’ll try calling Keyboard.Focus() from within the Loaded event, but if you put trace output in the OnIsKeyboardFocusWithinChanged method of the user control, you’ll see that the call works, but it reversed again. Yes, it seems something about the window creation process after the Loaded event resets the keyboard focus, yet for some reason Button controls are impervious. TextBoxs may be as well; I’ve not tried.
In the end, I had to come up with a shameful hack. Behold, FocusHelper:
static class FocusHelper
{
private delegate void MethodInvoker();
public static void Focus(UIElement element)
{
//Focus in a callback to run on another thread, ensuring the main UI thread is initialized by the
//time focus is set
ThreadPool.QueueUserWorkItem(delegate(Object foo) {
UIElement elem = (UIElement)foo;
elem.Dispatcher.Invoke(System.Windows.Threading.DispatcherPriority.Normal,
(MethodInvoker)delegate()
{
elem.Focus();
Keyboard.Focus(elem);
});
}, element);
}
}
Don’t get all holier-than-thou. War makes a man do things he’d never dream of doing otherwise. So does WPF programming.
Windows Presentation Foundation: I must be missing something
I’ve spent most of the day playing around with the .NET Framework 3.0 RC1 bits. I’m primarily interested in the Windows Presentation Foundation (WPF) stuff, which deals with UI construction for both desktop and web apps under Vista/IE7 and beyond (though it works on XP and 2k3 Server too).
(As an aside, the new Windows x Foundation marketing idiom is among the more stupid in recent memory. That they’re bundled together into the .NET Framework 3.0, which has no obvious textual resemblance to the Windows somethingotherother Foundation template makes it all the more awkward. I liked WinFX better.)
Anyway, I grabbed the RC1 Windows Vista SDK, the .NET Framework 3.0 RC1 runtime, and the CTP “Orcas” preview stuff for Visual Studio 2005; this latter bit provides minimal (and I do_ mean _minimal) designer support and templates for WPF stuff.
I’ll start by saying the designer is for shit. The readme that ships with it basically says “it’s shit; it’ll probably get better before ‘Orcas’ ships”. The readme is a master of understatement. It took me about five minutes of XAML hacking to introduce markup that worked, did what I wanted, but broke the designer (the AllowTransparency property on a Window, in case you’re wondering). The designer is slow, redraws constantly, and craps out on the slightest error.
That said, the new split-screen interface, with the WYSIWYG designer on top and a XAML markup window underneath is a welcome innovation. Once it’s faster, sturdier, and more featureful, it’ll be a major upgrade.
Speaking of the XAML editor, it’s much more mature than the designer. The Intellisense surprised me with the number of places it was aware of context and could display the list of available properties. I’ve no complaints there.
Which brings me to the compiler. XAML is less a UI description language and more a general-purpose object persisence language; its primary use in WPF happens to be describing the object graph used to construct UIs. Thus, much like ASP.NET pages and their codebehind counterparts, XAML and its codebehind are compiled together to form the resulting binary. As such, the compiler catches errors in the XAML at build time, just like it would in CS files.
What sucks is that it seems to miss some errors that I would’ve thought it could detect at compile time (like missing resources, non-numeric values for numeric properties, etc). What sucks more is how this manifests itself: you run the app, and it eats shit and dies on an unhandled exception, either at startup or later depending upon where the error is. If you happen to trap this exception in the debugger, rather than seeing what the error is, you’ll see the Meaningless Exception of Doom, ‘the target threw an exception’. If you peel away about five layers of InnerException indirection, you’ll finally arrive at a meaningful error message, which in all fairness is usually pretty obvious.
This has got to change before release or Orcas, since it’s really easy to get XAML wrong as you’re learning and playing around with it. I’d like to see an error page-type display ala ASP.NET, with an excerpt of the code with the problem, the actual exception, etc.
And last, I find the Style and Template features of WPF to be woefully underpowered. When I first started reading the SDK docs about styles, I was heartened, as the feature seems well thought out and solid. In fact, it put me in mind of CSS, so much so that I assumed I could do CSS-like things with it. Mistake.
In CSS, you specify styles through a combination of a selector, which determines what elements you’re styling, and the style attributes themselves. The selector can be simple, like ‘all A elements’ or ‘the element having ID foo‘, but it can also be complex, like ‘all DIV elements inside a SPAN element which is inside a DIV element named foo‘. CSS needs this flexibility because people construct complex docuements and as such they require complex styles.
WPF, on the other hand, goes to great pains to make it easy to turn a button green when the mouse is over it, but fails miserably at providing a rich, flexible style function. Say I have a user control, and when it has input focus I want it to change its font to bold, and increase the opacity level on one of its child controls. Seems reasonable, right? You’d think, and you’d be wrong.
Despite being the subject of more sample code and breathless MSDN Magazine articles than any Microsoft buzzword technology before it, I can’t find any discussion or demonstration of the use of a style that defines properties for an element tree, or a way to gather multiple styles together and apply them en masse. To my mind, this is a serious limitation, and makes it feel like I’m back to coding HTML with CSS 1.0, which is rather ungrateful of me since it’s not like I had any of these abilities back in Winforms 2.0, but my sense of entitlement blinds me to the hardships of the past.
I’m hopeful that I’m just missing something, and that as I play with XAML/WPF a bit more I’ll get these issues ironed out, but experience suggests that’s alot to hope for…
Trouble installing .NET Framework 3.0 RC 1
The RC1 of the .NET Framework 3.0 recently became available for download, along with the RC1 version of the Platform SDK–erm, I mean–Windows SDK for Vista. Apart from the usual incessant Microsoft rebranding, there are in fact some major technical differences since 2.0 as well, hence my interest.
However, I cannot get the lightweight, downloads-what-it-needs-on-the-fly version of the installer to work. It fails consistently, early in the setup process. The dd_dotnetfx3error.txt file in Local Settings\Temp has this to say:
[09/09/06,23:16:01] RGB Rast: [2] dlmgr: -2147023651, CDownloadJob::AddFile() : Failed to add http://go.microsoft.com/fwlink/?LinkId=56151&clcid=0x409 -> C:\DOCUME~1\god\LOCALS~1\Temp\dotnetfx304324.17\1033\wcu\rgbrast\x86\RGB9RAST_x86.msi to the download job.
Context: 0 Error code: -2147023651 Description:
[09/09/06,23:16:01] RGB Rast: [2] Failed to fetch setup file in CBaseComponent::PreInstall()
[09/09/06,23:16:01] setup.exe: [2] ISetupComponent::Pre/Post/Install() failed in ISetupManager::InternalInstallManager() with HRESULT -2147467260.
[09/09/06,23:16:01] setup.exe: [2] CSetupManager::RunInstallPhase() - Call to Pre/Install/Post for InstallComponents failed
[09/09/06,23:16:01] setup.exe: [2] CSetupManager::RunInstallPhaseAndCheckResults() - RunInstallPhase() returned a NULL piActionResults
[09/09/06,23:16:01] setup.exe: [2] CSetupManager::RunInstallFromList() - RunInstallPhaseAndCheckResults failed [2]
[09/09/06,23:16:01] setup.exe: [2] ISetupManager::RunInstallLists(IP_PREINSTALL failed in ISetupManager::RunInstallFromThread()
[09/09/06,23:16:01] setup.exe: [2] ISetupManager::RunInstallFromThread() failed in ISetupManager::RunInstall()
[09/09/06,23:16:01] setup.exe: [2] CSetupManager::Run() - Call to RunInstall() failed
[09/09/06,23:16:01] WapUI: [2] DepCheck indicates RGB Rast is not installed.
[09/09/06,23:16:01] WapUI: [2] DepCheck indicates WIC Installer was not attempted to be installed.
[09/09/06,23:16:01] WapUI: [2] DepCheck indicates XPSEPSC x86 Installer was not attempted to be installed.
[09/09/06,23:16:01] WapUI: [2] DepCheck indicates Windows Communication Foundation was not attempted to be installed.
[09/09/06,23:16:01] WapUI: [2] DepCheck indicates Windows Presentation Foundation was not attempted to be installed.
[09/09/06,23:16:01] WapUI: [2] DepCheck indicates Windows Workflow Foundation was not attempted to be installed.
[09/09/06,23:16:01] WapUI: [2] DepCheck indicates Microsoft .NET Framework 3.0 was not attempted to be installed.
[09/09/06,23:27:31] RGB Rast: [2] Error: Installation failed for component RGB Rast. MSI returned error code 1618
[09/09/06,23:27:35] WapUI: [2] DepCheck indicates RGB Rast is not installed.
[09/09/06,23:27:35] WapUI: [2] DepCheck indicates Microsoft .NET Framework 3.0 was not attempted to be installed.
Clear as mud, I know. The .netfx3 readme says this might happen, though it characterizes it as an ‘intermittent failure’, which to my mind means ‘happens sometimes’, not ‘eats shit and dies every single time’. At any rate, its advice is to eschew the fancy you’re-too-stupid-to-download-on-your-own installer and just, well, download on your own.
To that end, I’ve instead opted for the full RC1 bits, which weigh in at a measly 48MB, leading me to wonder why MS needed a lightweight installer in the first place.
That install worked fine. The post-install screen did exhort me to install KB912817 to enable some WCF stuff like atomic web services and MSDTC integration, neither of which I care about.
Why is profiling in .NET so hard?
I’m working on an entry for TopCoder’s third Intel MultiThreading challenge. Unlike all the winners of past contests, I’m using C#, despite the considerable performance disadvantage it suffers relative to C++, particularly on the Cygwin-based C# compiler used by TopCoder.
Nonetheless, the objective isn’t the win the paltry prize money; I’d like to see how close to the winning solution a C# implementation can get, and at the same time keep me on my intellectual toes while I languish here in Iraq.
At any rate, apart from correctness, the challenge is to solve a computationally-intensive string processing problem in as little time as possible. To start with, I coded a naive implementation and submitted it for a correctness test, but it failed the more complex tests because the code failed to complete in the 60 seconds alloted time.
The first response to any such problem is to whip out a profiler, find the hotspots in the code, and make them suck less. Unfortunately, the profiling situation in .NET is poor.
MS don’t provide any meaningful profiling support on their own; the somewhat misnamed CLR Profiler actually profiles the CLR heap, not CLR execution, making it utterly useless for the task at hand.
An open source project, nprof, used to provide decent bare-bones profiling, but it’s a dead project and has, shall we say, limited .NET 2.0 support. I was unable to make it work with .NET 2.0 despite considerable exertion to that end.
Apart from nprof, one’s options are commercial. In the .NET 1.1/VS2k3 days, Compuware offered a free ‘Community Edition’ of DevPartner Profiler, which sucked in the proud tradition of other CompuWare products, but often as not could point you in the direction of the performance problem before eating shit and dying, taking VS2k3 with it.
Anyway, once CompuWare realized it was possible to use the CE product for something useful, they of course pulled the plug, so the new version of DevPartner Studio, 8.0, which works w/ VSk5/.NET 2.0 doesn’t have a Community Edition.
So, that leads me to JetBrains dotTrace, a commercial .NET profiler. I’m downloading an eval now, and if I really like it I may actually pay the $250 for the commercial version. If I only sort of like it I’ll find a crack.
DataFormatString property on ASP.NET BoundField Ignored for Dates
Today one of my Iraqi devs, E, was trying to display some search results from a MySQL database in an ASP.NET GridView control. He simply set the DataSource property to the DataReader attached to the results, and called DataBind() on the grid control.
All was well, except the dates; they displayed in the short date/time format, even though the times are not used in this application (and thus were all 12AM). I showed him the beauty of the DataFormatString on the BoundField which displayed the date, but to my surprise, setting it to {0:d} didn’t change the output at all.
I scratched my head for a while, then came across MSDN Labs Bug ID FDBK35199, which describes the logic behind this intentional behavior. The reasoning? To prevent script injection attacks, the DataFormatString property is applied AFTER the value is HTML encoded, so it’s no longer a DateTime by the time it is formatted.
That’s fine, except the underlying value is a DateTime! No string representation of DateTime contains script elements, let alone malicious Javascript! So why, then, is this functionality so braindead!? Whatever happened to the principle of least surprise?!
Anyway, the fix is to disable HTML encoding by setting HtmlEncode to false on the BoundField element that displays date values. Lame!