Generating memorable passphrases from mad-libs
After my last experiment with Markov chains for passphrase generation, I thought I’d try something different. Rather than go crazy with statistical trickery, why not take a list of words organized by part of speech, and use templates with placeholders for different parts of speech (effectively, simple mad-libs) to generate memorable passphrases?
The POS list from the word lists collection contains a nice selection of English words and the part or parts of speech to which they belong. The README doesn’t have a breakdown of the word counts per POS, so I used a quick Ruby script to produce the following:
Noun has 128674 words
Noun Phrase has 71806 words
Adjective has 57900 words
Verb (usu participle) has 18443 words
Verb (transitive) has 16265 words
Adverb has 14782 words
p has 8554 words
Verb (intransitive) has 5141 words
Interjection has 430 words
Preposition has 162 words
Pronoun has 117 words
Definite Article has 103 words
Conjunction has 93 words
e has 1 words
p and e are not listed in the README file as parts of speech so I assume they’re some sort of secondary tag.
Anyway, that’s a goodly portion of nouns and verbs to work with. Enough to populate a simple template like:
<definite article> <adjective> <noun> <verb> <adverb>.
Using Excel and Shannon’s theory of information, I computed the entropy of such a template applied to the above wordlist, assuming each word in a given POS has equal probability of selection:
POS Noun Verb Adverb Adjective Total
Num words (size of X): 128674 18443 14728 57900
p(x) for any x in X 7.77158E-06 5.42211E-05 6.78979E-05 1.72712E-05
p(x) log p(x) -0.00013191 -0.000768356 -0.000940133 -0.000273252
SUM of above for all x in X -16.97336105 -14.17078573 -13.84627391 -15.82127573
Negative sum (entropy) 16.97336105 14.17078573 13.84627391 15.82127573 60.81169641
So using the POS word list and the above template, I can generate 60-bit passphrases. That’s a far cry from 128-bit AES keys, but it’s still not bad for an easy to implement and easy to understand solution. You could also boost the entropy by randomly applying permutations like upper casing random letters or replacing letters with numbers (L is 1, O is 0, etc).
So, what sort of passphrases would this produce? Here’s a sample:
fifty interplacental ugliness sleaving lunatically.
such uretic apertometer straddled unimpulsively.
eighteen overground inflationism funneled treacherously.
twelve quasi-military aquamanile rummage curvilinearly.
nineteen annunciative Malanje light presumptively.
hundred amaranthaceous hindi reinspired nonbodingly.
those unfirm lah do work moveably.
another chockful miracle-worship stencilling certes.
two humourless half-jack outsmart sweepingly.
eighty vaulting pediculus counterclaim nonpredicatively.
Clearly these would make more sense if the POS tags included things like plural vs. signular nouns and so on, but doing so would reduce the possible words for each template position and thus lower the entropy. It’s possible a more grammatically correct template could be made longer while remaining equally memorable, but I’m not familiar with any experimental evidence as to the extent grammatical correctness improves recall, and I suspect it wouldn’t be a big enough improvement to overcome the need for more words in the template.
Tags: mad libs, markov, Migrated from Drupal, passphrases, tech diary