apocryph.org Notes to my future self

23Jun/060

Project Idea: srcML tagger using TextMate Language Grammars

One of the recurring project ideas I have is the construction of a fast, lightweight, simple srcML tagger, which can markup source files in an extensible set of languages using a subset of the srcML tags. Atop this tagger, an intelligent diff tool could provide more meaningful diffs between versions of a code file, and provide visualizations of the history of tokens within a source file, ala svn blame, but more meaningful.

This idea has always been held back by the difficulty of describing different languages in enough detail to do meaningful srcML tagging, but without so much detail that one ends up writing a complete parser for that language. In the past I had looked to syntax highlighting tools for inspiration, but they all struck me as ad-hoc and not particularly elegant.

No more. Having read about the language grammars in TextMate, I think they may be the simple generalization I’ve been looking for.

It doesn’t seem too difficult to implement a parser for the straightforward language grammar format, and from there not too difficult to mark up a source file according to that grammar. With translations between the language grammar elements and corresponding srcML tags, a tagged srcML document could be produced from the input file relatively easily.

The end result would be a lightweight, simple toolkit that could produce meaningful code history reports ala CodeHistorian.

Someday…

Comments (0) Trackbacks (0)

No comments yet.


Leave a comment


No trackbacks yet.

Delicious Bookmarks

Recent Posts

Meta

Current Location