As I said earlier, I recently converted my old WordPress-based web sites to static HTML. It was pretty simple but, of course, there was nuance.
I downloaded the sites using a macOS program called SiteSucker. I’m guessing you could do the same thing with curl
or wget
, but I was familiar with SiteSucker, I knew it would work without a lot of trial-and-error fiddling, and I already had it. I’m impressed with SiteSucker’s elegant simplicity, and it was worth firing up an old macOS machine to not have to fiddle with parameters for a Linux utility or find a FOSS equivalent to SiteSucker.
I got the files without trouble, and when I put them on an Apache instance, everything just worked. Amazing, really, and I suppose that’s all I actually needed to do. In less than an hour, I had all three sites I wanted to staticize up and running on the web.
Some of the graphics are missing and some typefaces have changed, but I’m not too concerned about that. These sites are really only of historical interest and don’t need to be letter-perfect.
As I was testing, though, I found something a little less ignorable. I don’t know whether this is something specific to WordPress or to SiteSucker or the combination, and it’s likely that there is a setting in SiteSucker that would fix it, but I found that deep links into my sites no longer worked. A URL that used to point to a specific post or page now just took me to the home page. Since my primary interest was in keeping old links from going dead, this didn’t seem ideal.
Basically (using “999” as a placeholder) my WordPress configuration rendered a link to a page as:
https://example.com/?page_id=999
and a link to a post as:
https://example.com/p=999
Once the site was staticized, those links would just go to the home page. My site now expected links to a page to look like:
https://example.com/index%EF%B9%96page_id=999.html
and a link to a post like:
https://example.com/index%EF%B9%96p=999.html
I didn’t take the time to figure out what the “%EF%B9%96” was supposed to do. I suspected, though, that Apache’s powerful and arcane mod_rewrite
function could fix the problem through regular expressions — my favorite way of spending more time on a project than I expected to.
It turned out to be pretty simple. Here are the Apache directives that work the magic:
RewriteEngine on
RewriteCond %{QUERY_STRING} ^page_id=([0-9]+)$
RewriteRule "^/$" "index\%EF\%B9\%96page_id=%1.html" [R,NE,L]
RewriteCond %{QUERY_STRING} ^p=([0-9]+)$
RewriteRule "^/$" "index\%EF\%B9\%96p=%1.html" [R,NE,L]
I decided to do redirects (the [R] flag) instead of just invisibly rewriting, because I’m not planning to ever go back to WordPress so folks might as well use the actual links going forward, even if they do contain some odd characters.
—2p