After my flash of insight I’ve decided to build a tool to help me migrate apocryph.org away from drupal.
Requirements are:
- Work with my Drupal configuration
- Not so tightly coupled to my Drupal configuration that no one else can use it
- Output posts in a neutral format that users can post-process and import into other tools
- Preserve all the important elements of each post, including:
- Formatting. Most posts are in Markdown, and a few are in SmartyPants. Those must be converted to XHTML using the same rules which generate the markup in Drupal
- Files. A few of my posts have files attached, usually images but sometimes other stuff. Those files must be preserved themselves, and any references to the files from within a post (like IMG or A elements) must be preserved as well
- Metadata. The tags, author, timestamp, published/unpublished flags must be preserved
- Links. I often link between posts, and the migrated output must preserve those links. They can’t go away or be 404
- URLs (ideal but not required). Notwithstanding the above requirement to preserve intrasite links, it would be vastly preferable of the actual URL of each migrated posting could be preserved so any existing links to pages on Apocryph are preserved.
- Comments. Existing comments attached to posts must be preserved in their entirety.
Right now I’m looking at the Wordpress eXtended RSS (WXR) format. It build on the standard RSS format with additional WP-specific tags, which I think will accommodate my requirements. As an added bonus, WordPress has built-in import support for WXR files, so I can easily suck the resulting file into WordPress, which I’ve selected as the successor to Drupal.