After my last experiment with Markov chains for passphrase generation, I thought I’d try something different. Rather than go crazy with statistical trickery, why not take a list of words organized by part of speech, and use templates with placeholders for different parts of speech (effectively, simple mad-libs) to generate memorable passphrases?
The POS list from the word lists collection contains a nice selection of English words and the part or parts of speech to which they belong.
This past weekend I dusted off my prototypical Ruby implementation of Markov chains for the purposes of generating sentences that bear striking similarities to a corpus of sample text, but are in fact random nonesense text. My first exposure to this idea was the implementation in Kernigan and Pike’s Practice of Programming, but I’ve run across it a number of times since.
Most Markov text generation schemes I’ve run across are just for fun, like mixing the text characteristics of the Bible and Dr. Seuss or whatever.
Previously I’ve noted how neat I think it would be to use Markov text generation) to generate random text from a corpus, but adjust some of the Markov model parameters based on another corpus, in an attempt to yield, for example, a Seussian user manual or Edgar Allan Poe in Biblical English.
I’ve found a Perl toolkit, SVMTool, which provides the POS tagger necessary to mark up English words with their part of speech. It should be easy enough to fetch an arbitrary RSS feed, break it down into paragraphs, sentences, words, build a Markov model from that, and somehow build a hybrid Markov model from the input feed and some examplar text. From this hybrid Markov model, amusing texts could be generated that ‘feel’ like a cross between the source feed and the exemplar corpus.