Finally moved from DreamHost & Drupal to FutureHosting & WordPress
If you’re reading this, I’ve moved apocryph.org over to FutureHosting, and I have finally migrated away from Drupal and into WordPress. Right now I’m running the stock skin and I haven’t done any configuration apart from migrate in my Drupal content and set up the Permalink Redirect plugin so the old Drupal-style links continue to work. As time permits I’ll be porting my custom Drupal theme into WordPress and generally adapting the look and feel to my tastes. Here’s hoping I don’t regret this…
Note that subdomains like svn.apocryph.org and wiki.apocryph.org haven’t been migrated over yet, so you’ll have to bear with me for a little while.
PHP Sucks
My work on a tool to migrate Drupal content to WordPress’ eXtended RSS (‘WXR) led me into some dusty corners of the WordPress codebase, and I’ve been meaning to write a grumpy post about how much I hate PHP (in which WordPress is written), but Jeff Atwood at Coding Horror beat me to it with his own PHP sucks lament. Like me, Jeff wonders at the success of PHP given what a dreadfully sucky software engineering tool it is, and scratches his head at the many major Internet properties (Wikipedia, Digg, and WordPress among them) which are successful notwithstanding an implementation in a language a VB6 programmer might reasonable call “shit”.
Interestingly, though, Jeff and I arrived at two different conclusions on the matter. Jeff surmises:
Some of the largest sites on the internet — sites you probably interact with on a daily basis — are written in PHP. If PHP sucks so profoundly, why is it powering so much of the internet?
The only conclusion I can draw is that building a compelling application is far more important than choice of language. While PHP wouldn’t be my choice, and if pressed, I might argue that it should never be the choice for any rational human being sitting in front of a computer, I can’t argue with the results.
You’ve probably heard that sufficiently incompetent coders can write FORTRAN in any language. It’s true. But the converse is also true: sufficiently talented coders can write great applications in terrible languages, too. It’s a painful lesson, but an important one.
Hmm. I came to a different conclusion. Though I don’t know any of the programmers at the major PHP-powered properties, I have looked at some of their code, and I must say it does not look like the product of talented software engineers making due with a shit language; it looks like cruft bodgered together until it works, as though the last twenty years of hard-won software engineering advancements never happened.
To take but one example, I am working on some code that exports a Drupal site in WordPress’ WXR format, which is basically RSS with some additional WordPress-specific elements thrown in. Of course the format isn’t documented at all, so I set up a simple WordPress site, wrote some test posts with attachments and comments and whatnot, exported the site to WXR, and used the resulting WXR file as an example upon which to base my own implementation. I’ve mostly got it to work now, but I ran into some troubles along the way due entirely to the poor construction of at least the WXR import portion of the WordPress codebase.
To start with, WXR, like RSS before it, is an XML format. XML, having been around for over a decade now, is a very well-understood format. One of the fundamental principles of XML is that whitespace doesn’t matter. You can put text right after a start tag, or on the next line indented with a tab, or after several blank lines, and it will parse just the same (some exceptions but bear with me). I foolishly assumed that since RSS uses XML, and WXR extends RSS, that I needed only produce XML that was structurally identical to the WXR sample I had, without regard for whitespace. When I imported the first WXR file from my tool, WordPress didn’t complain but I noticed none of my content was coming through. WTF?
I then cracked open the import code at wp-admin/import/wordpress.php. Once there I stared in horror as I realized what they were doing. Rather than parsing the WXR file with, you know, an XML parser, the author of this particularly craptastic bit of code used a regular expression to find certain tag pairs, hard-coding the namespace prefix and assuming no whitespace between the start tag, content, and end tag. Super!
Not content merely with a gobsmackingly vile solution to a problem solved 10 years ago by the XML DOM, the code does neato things like injecting local variables into the local namespace non-obviously with calls to helper functions, and performing what amounts to zero error handling. For example, there was a bug in my stuff that migrated blog posts with a type attribute of blog, when it should have been post. Strangely, the import into WordPress ran without error, but none of the migrated blog posts showed up. Curious, I looked inside the WordPress MySQL database, and found all of my missing posts there; they just weren’t showing up. Rather than sanity-check the WXR input, the import logic just sucked it dutifully in, so future queries for type='post' would happily skip over my content without so much as a peep.
While I do have some sympathy for those unfortunate bastards who must work with a shit tool like PHP, I know for a fact that it’s not impossible to engineer quality code with it. If you doubt me, download the source tarball for Gallery 2 and find out how the MVC pattern, unit testing, and object orientation are indeed possible in PHP.
So what to make of this? I don’t think software engineering quality has anything to do with the success of a piece of software. Pristine, gleaming, perfectly coherent code doesn’t seem at all correlated with success, and clumsy bodgered-together cruft seems the rule and not the exception at the top of the software pile. Can you find well-engineered software that was successful? Sure. Can you point out spectacularly monstrous code that failed? Even easier. But explain to me how crufty software like Mediawiki and WordPress can be such runaway successes while Gallery and Drupal remain relatively obscure, if software engineering quality is so critical to success?
Now don’t get me wrong. I’m not one of those backward types who doesn’t see the point in unit testing, thinks object orientation is too complicated, and wouldn’t know a bad code smell from roadkill. I take the practice of software engineering seriously, and strive to build the best software I can within the constraints of our business. In this regard I’m more like the second stonecutter in Peter Drucker’s Three Stonecutters parable:
A favorite story at management meetings is that of the three stonecutters who were asked what they were doing. The first replied, ‘I am making a living.’ The second kept on hammering while he said, ‘I am doing the best job of stonecutting in the entire country.’ The third one looked up with a visionary gleam in his eyes and said, ‘I am building a cathedral.’
The third man is, of course, the true ‘manager.’ The first man knows what he wants to get out of the work and manages to do so. He is likely to give a “fair day’s work for a fair day’s pay.”
It is the second man who is a problem. Workmanship is essential; without it no business can flourish; in fact, an organization becomes demoralized if it does not demand of its members the most scrupulous workmanship they are capable of. But there is always a danger that the true workman, the true professional, will believe that he is accomplishing something when in effect he is just polishing stones or collecting footnotes. Workmanship must be encouraged in the business enterprise. But it must always be related to the needs of the whole.
… The tendency to make the craft or function an end in itself [in future] will therefore be even more marked than it is today. … The new technology will need both the drive for excellence in workmanship and the consistent direction of managers at all levels toward the common goal.”
I don’t accept that disciplined software engineering is equivalent to “stone polishing”, but I think the point is that past a certain qualitative point (which is different for each activity and each situation) improvements in workmanship don’t contribute meaningfully to value in the marketplace, and if undertaken at the expense of other activities which could have increased value can actually result in a net loss of productivity.
From the success of WordPress and Mediawiki and Youtube, I think it’s hard to make the case that more time and resources should’ve been spent architecting and writing better PHP code, but I don’t know how one can determine a priori where the productive labors stop and the stone polishing starts. If PHP has one thing going for it, it’s that it makes anything but the most cursory stone polishing so clumsy and uncomfortable that only the most pedantic of programmers would bother. Make of that what you will.
Requirements for Drupal migration tool
After my flash of insight I’ve decided to build a tool to help me migrate apocryph.org away from drupal.
Requirements are:
- Work with my Drupal configuration
- Not so tightly coupled to my Drupal configuration that no one else can use it
- Output posts in a neutral format that users can post-process and import into other tools
- Preserve all the important elements of each post, including:
- Formatting. Most posts are in Markdown, and a few are in SmartyPants. Those must be converted to XHTML using the same rules which generate the markup in Drupal
- Files. A few of my posts have files attached, usually images but sometimes other stuff. Those files must be preserved themselves, and any references to the files from within a post (like IMG or A elements) must be preserved as well
- Metadata. The tags, author, timestamp, published/unpublished flags must be preserved
- Links. I often link between posts, and the migrated output must preserve those links. They can’t go away or be 404
- URLs (ideal but not required). Notwithstanding the above requirement to preserve intrasite links, it would be vastly preferable of the actual URL of each migrated posting could be preserved so any existing links to pages on Apocryph are preserved.
- Comments. Existing comments attached to posts must be preserved in their entirety.
Right now I’m looking at the WordPress eXtended RSS (WXR) format. It build on the standard RSS format with additional WP-specific tags, which I think will accommodate my requirements. As an added bonus, WordPress has built-in import support for WXR files, so I can easily suck the resulting file into WordPress, which I’ve selected as the successor to Drupal.
Added a virtual host to Apache and installing WordPress on FreeBSD
Recently I had cause to set up a WordPress blog engine on a virtual host on bonzo. My experience follows:
First I need to set up DNS for the domain. The owner of the domain used the registrar’s control panel to set the authoritative nameservers to ns1.afraid.org through ns4.afraid.org, which are the nameservers provided by FreeDNS.
Next, I log into my FreeDNS account, add the new domain to my domains, and point the domain and www. to bonzo’s IP address. I don’t have mail setup yet, so I’ll ignore the MX record for now.
Now, querying the domain in a web browser should bring me to my site on bonzo…sure enough, it does.
The next step isto set up virtual hosting on bonzo, which I’ve not yet had a need to do. The Apache docs on virtual hosting refer to virtual hosting by host name ‘name-based virtual hosting’.
It’s pretty straightforward; you define what IP addresses and host names you want to associate with what server roots. There is one huge gotcha:
(From the name-based virtual hosting docs
Now when a request arrives, the server will first check if it is using an IP address that matches the
NameVirtualHost. If it is, then it will look at each<VirtualHost>section with a matching IP address and try to find one where theServerNameorServerAliasmatches the requested hostname. If it finds one, then it uses the configuration for that server. If no matching virtual host is found, then the first listed virtual host that matches the IP address will be used.As a consequence, the first listed virtual host is the default virtual host. The
DocumentRootfrom the main server will never be used when an IP address matches theNameVirtualHostdirective. If you would like to have a special configuration for requests that do not match any particular virtual host, simply put that configuration in a<VirtualHost>container and list it first in the configuration file.
This is definitely not what I would expect. It means I have to make sure that the current server root is present as the first <VirtualHost> element, else my existing site will break.
So, first I’ll create the <VirtualHost> in /usr/local/etc/apache22/httpd.conf entry for my existing doc root and make sure that works:
# Apply virtual hosts to all requests
NameVirtualHost *:80
# The 'default' virtual host, used when the host doesn't match one of the others
<VirtualHost *:80>
DocumentRoot /usr/local/www/drupal
</VirtualHost>
Next, an apachectl graceful to restart with the new config, and all is well. Requests to apocryph.org and bonzo.celatrix.com are still handled by my Drupal install as before.
Now I can create a separate doc root for the new domain, and point queries to it there. I’ll use /usr/local/www/craeton.com/:
bonzo# mkdir /usr/local/www/craeton.com
bonzo# chown www /usr/local/www/craeton.com/
bonzo# chgrp www /usr/local/www/craeton.com/
And another <VirtualHost> in httpd.conf accordingly:
<VirtualHost *:80>
ServerName craeton.com
ServerAlias *.craeton.com
DocumentRoot /usr/local/www/craeton.com
</VirtualHost>
Then a <Directory> entry to define the behavior of the /usr/local/www/craeton.com directory when exposed by Apache:
# DocRoot for the craeton.com virtual host
<Directory "/usr/local/www/craeton.com">
Options Indexes FollowSymLinks
AllowOverride None
Order allow,deny
Allow from all
</Directory>
I basically just copied the settings from the <Directory> entry for my default doc root.
So now, requests to craeton.com should resolve to this new folder. I’ll drop a simple index.html file and see what happens…it works.
Next I’ll need to create a MySQL database, user, and password for WordPress to use. First, the database:
$ mysqladmin -u root create wp_craeton -p
Enter password:
$
Now the user and password:
mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 16893 to server version: 5.0.9-beta
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> grant all privileges on wp_craeton.* to 'wp_craeton'@'localhost' identified by '[secret]';
Query OK, 0 rows affected (0.06 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.03 sec)
Too easy.
Now I’m ready to install WordPress into this folder. There’s a port in /usr/ports/www/wordpress, but PHP apps like WordPress are so easy to get running I’d rather grab the latest bits from wordpress.org so I have control over the install.
I’ll download the latest directly on bonzo via my trusty SSH session.
$ wget http://wordpress.org/latest.tar.gz
$ tar xzf latest.tar.gz
$ ls
index.html latest.tar.gz wordpress
So, it extracted into a ‘wordpress’ folder. I don’t want that; I want the WordPress stuff immediately under the creaton.com/ folder. Easy enough:
mv wordpress/* .
Looking at the readme.html included in the tarball, there is a ‘Famous Five-minute Install’:
- Unzip the package in an empty directory
- Open up
wp-config-sample.phpwith a text editor like WordPad or similar and fill in your database connection details- Save the file as
wp-config.php- Upload everything.
- Open
/wp-admin/install.phpin your browser. This should setup the tables needed for your blog. If there is an error, double check yourwp-config.phpfile, and try again. If it fails again, please go to the support forums with as much data as you can gather.- Note the password given to you.
- The install script should then send you to the login page. Sign in with the username admin and the password generated during the installation. You can then click on ‘Profile’ to change the password.
Ok, item 1 is done. I’ll copy wp-config-sample.php to wp-config.php and put the DB connection info in it.
It was pretty easy; I just edited the DB info as it said:
<?php
// ** MySQL settings ** //
define('DB_NAME', 'wp_craeton'); // The name of the database
define('DB_USER', 'wp_craeton'); // Your MySQL username
define('DB_PASSWORD', '[secret]'); // ...and password
define('DB_HOST', 'localhost'); // 99% chance you won't need to change this value
// You can have multiple installations in one database if you give each a unique prefix
$table_prefix = 'wp_'; // Only numbers, letters, and underscores please!
// Change this to localize WordPress. A corresponding MO file for the
// chosen language must be installed to wp-includes/languages.
// For example, install de.mo to wp-includes/languages and set WPLANG to 'de'
// to enable German language support.
define ('WPLANG', '');
/* That's all, stop editing! Happy blogging. */
define('ABSPATH', dirname(__FILE__).'/');
require_once(ABSPATH.'wp-settings.php');
?>
Step 3 is equivalent to the copy to wp-config.php and thus doesn’t apply.
Step 4 is not applicable either, since I’m doing all this directly on bonzo via SSH.
In step 5, I navigate to the craeton.com/wp-admin/install.php. I get a splash screen and a ‘First Step’ link, which I click.
The first step in this install process prompts for a blog title and an email address. I’ll provide it and move on.
In the next screen it pauses to create the database tables, then generates a temporary password for the admin user. I’m admonished to not forget it, and given a link to the login page. I’ll go there now.
Logging in w/ the admin account and random password, I get a clean, simple admin GUI. I’ll change the password to something memorable, and declare victory. Too easy!