We continue our discussion of your closets and dumb stuff that can really hurt you. In part III we look at "plan b" and "plan c" for what to do if the power goes out for both the physical and virtual servers as well as your MDFs.
If you put a physical server in any closet, we want that server to communicate to the UPS via the NIC in the UPS for power status. If the power is going out, we want the UPS to tell the server it is time to shut down gracefully because there is no commercial power. This is done by the free software - PowerChute Network Shutdown. If you have a server, why would you let it crash and hope for the best when you can prevent most bad things by simply installing this free software?
If you put in a virtual server host, you can install PowerChute Network Shutdown for ESX to gracefully do the same thing for all your virtual servers. It is also free. There are some really nice things for the virtual configuration. First, you can point to multiple UPSes at once so that one bad UPS doesn't take out your closet if you have devices with redundant power supplies. Also, every VM will be protected - without installing any software on the VM. APC tells the ESX host it is time to gracefully shut down and the ESX host tells the individual VMs.
One thing you need to actively pay attention to is how much time you require to gracefully shutdown your equipment. If you are a Paladin Sentinel Monitoring client and you see servers report on the console that they had a dirty shutdown from a power event, that should prompt you to re-examine the PowerChute settings and the APC settings and adjust the amount of time you allow between the graceful shutdown announcement and the UPS turning off to allow adequate time to do what needs to be done.
Plan C is to have a generator to keep the closet up. More and more of you have added generators for your MDFs. However, we have a few suggestions.
- Even if you have a generator, please still implement the PowerChute Network Shutdown scenario anyway. Generators can fail and I have seen too many server crashes and a couple of cases of corruption caused simply from a lack graceful shutdown that was freely available. Don't fall in that trap.
- Test your generator weekly. Make sure you know it has tested, When we first moved into our present office, the landlord said they had a whole building generator that was tested weekly, off hours (so as not to annoy the tenants during the day). What we found out was that not testing, but no one knew that because no one was here. One day the power went out and we were in the dark. The batteries were bad and the warming unit to make it easy to turn the generator on was dead. The landlord repaired the generator and changed the test time to during the day when we were in the office. Michele walks outside and checks weekly that it is really testing.
- Test your generator under load. Once we had our issues resolved in my #2, we had a power event and the UPS came right on and fried a whole lot of equipment throughout the building. Paladin Sentinel showed our UPSes thought there were 190-200 volts on 120 volt lines! One of our UPSes actually thought it was a lightning strike and sacrificed itself to protect the core equipment. What we found out was while the generator turned on properly it was never tested and adjusted under load to make sure it was putting out proper power. The generator vendor came over and simulated the building load and tuned the generator.
- Test what your UPSes think of the quality of power coming out of your generators. We had a situation where a district had shiny new generators across the district. They also had new UPSes. They should be in a very good place that most of you would be envious of. Bad weather came which turned into a state of emergency. The generators kicked on to keep everything alive. As previously discussed in earlier sections of this series, the UPSes kicked on because they didn't like the power. They ran on the battery to exhaustion and then shutdown. The district went dark. The generators kept running. Because of the state of emergency, no one could even drive there to figure out what was going on and rectify that. The solution was to tune the generators to produce the best possible power output and tune the UPSes to allow the power produced by the generators without shutting down.
We've covered a lot of ground in these three articles. If you need help sorting this all out, give us a call. We're happy to help you.