Projects
As an auto-didact, much of my free time is spent working on various self-edification projects, few of which I actually complete, but all of which I learn from. Some of the more interesting projects are listed here.
Personal research agenda
Some questions come up again and again in my self-edification projects. These questions are elevated to my Personal Research Agenda
STUN Implementation
STUN(Simple Traversal of UDP Through NAT) is a simple UDP protocol defined in RFC 3489, which reliably detects and classifies NAT devices on the network route between a client and the STUN server, and assists the client in determining what, if any, public Internet address(es) it can advertise for inbound communication from other Internet nodes, themselves possibly behind a NAT.
This is particularly relevant for peer to peer applications, and push applications which target mobile data services, since wireless data carriers often place users behind NAT firewalls unbeknowst to them.
My particular STUN implementation is in Python, both because I wanted to give Python a try, and because the most obvious application of this algorithm is BitTorrent (written in Python), which currently does not work for users behind NAT, and thus excludes a huge proportion of the world’s bandwidth.
This implementation is also a platform for experimentation with my novel technique for traversing NAT with /incoming/ TCP connections; a task that has until now been fairly difficult without port forwarding.
Approximate String Matching Using q-grams in a Relational Database
Matching strings that are approximately the same is useful in a number of applications, and theoretical computer science and information retrieval have given us a number of complex, efficient algorithms for doing so. Unfortunately, almost none of them work with off-the-shelf relational database engines, and instead require custom data structures such as suffix trees, etc. This makes them useless for developers seeking to add approximate matching to a web site or database app, for example.
Fortunately, I ran across a couple of research papers extending the edit distance algorithm to SQL databases, with excellent scalability characteristics. I’ve created a prototype SQL Server implementation of the concepts in this article, which works remarkably well with my test data. Once I refine it and get it working reliably with substrings and not just prefixes, I’ll type up an article and throw the results on CodeProject (since we know DDJ would just reject it again.)
Adapting the new WS-Eventing spec to UDP and TCP, including NAT Traversal
Ever since SOAP was first described by Don Box years ago, I’ve wished it had some formalized support for subscription to and publication of events asynchronous to a Web Service call. Finally, that capability is beginning to manifest, in the form of the WS-Eventing proposal by Microsoft, et al. Of course, immediately afterwards, Sun et al proposed a competing spec, but that’s another story.
yway, all this web services stuff is great, but I know from experience that you can forget about sending an inbound TCP connection to clients on most WANs and many broadband links, thanks to the NAT(s) in the way. Thus, I’m interesting to see about marshalling SOAP over UDP (supposedly SOAP-RP and DIME are addressing this, but it seems no one supports it), and from there, use my NAT traversal stuff to get WS-Eventing notifications past NAT(s) and directly to the interested clients.
Maybe DDJ would be interested in that, since every other cover is about Web Services, with articles not much more substantive than “Control Your Toaster With SOAP.”
Making better use of Drupal
I use Drupal to run my site and blog. Pages like Published Works and this page should be dynamic based on posts made to particular vocabulary items. It’s simple PHP; I just have to get around to it.
UPDATE: I use a free-tagging vocabulary to implement this now. See the tags page for projects.