If you’re thinking about adding an uninterruptible power supply (UPS) to your systems, think very carefully.

Just before the turn of the century, I started my own server farm. It started with a single broken-down Powerbook 5300 and over the years grew to dozens of servers in 3 US states and two other countries. As the focus was on providing IT services to small, usually not-for-profit providers of health care, uptime was critically important. At the same time, it wasn’t a money-making proposition so the budget was always… limited. The servers tended to be salvaged, discarded, donated, or refurbished machines. Prudence seemed to require that I use UPS’s to keep the machines running in the event of a brief power outage.

In spite of the fact that my main server farm (in an unused room in my home) had absolutely fabulous utility power, I started installing UPS units for extra protection. At one point, my power management became surprisingly complex with a mix of solar, whole-house battery, backup generators, and yet more UPS units. I’m winding all this down now, and I’ll only be taking a handful of servers to my new island home (which is off-grid but will have a pretty massive, redundant array of solar panels and LiFePO4 batteries), but I’ll probably still want some UPS units to carry me through the glitches as I build and fine-tune my solar components. Which has me reflecting on this sorry fact:

I have had more downtime due to UPS failures than I would have had from grid failures had I just skipped the UPS.

That’s right. Using UPS units has caused more failures than they have prevented. Why? Partly, it’s because I mostly use consumer-grade units, but from big names like APS and CyberPower. Even so, there’s really no excuse for them to fail as badly as they do; it’s not so much poor quality as poor design. I also even had a failure of a refurbished unit that cost close to $5,000 when new and, again, the problem was caused by a design defect, not a lack of build quality. To put it in a nutshell, most of the units seem to be designed to cut off all power to the load when the system detects an internal problem. Even though they are, supposedly, standby-mode supplies that just pass through utility power unless and until there’s a grid failure, these UPS machines actually interrupted perfectly good grid power and killed my machines because the UPS itself had an internal failure.

The most egregious problem was a unit I had at a remote site that had occasional grid failures. I could have just run those servers from grid power, as they would automatically restart when power was restored, but I thought a UPS would eliminate the trauma that can occur when computers experience a sudden loss of power. What I didn’t know was that the UPS I put in there would not restart after a power failure that was long enough to deplete the battery! So even though the systems were able to get through a controlled shutdown as the unit depleted its battery, when the power came back up the rack remained dead. It required a 3am trip to the site just to turn the UPS back on. Nobody at the company who made the unit could explain to me why they thought it was a good idea to leave the system in a shut-down state even after power was restored.

So consider, carefully, the implications of having a UPS failure that leaves your equipment completely unpowered. In a lot of circumstances, that might be a worse outcome (as I’ve had far more UPS failures than computer, switch, and router failures combined) than just running straight from the grid.

—2p

← previous