CSI’s Paladin Monitoring Saves One Client From A Major Email Outage and Allows Us To Proactively Work On Another’s Email Outage Issue
CSI's Paladin Remote Monitoring solution uncovered two major Exchange 2010 email crisises in the span of an hour on Wednesday. One affected a little under 700 users and the other about 1350 users.
The first event was that Paladin gave us an Exchange alert called "back pressure". This is where Exchange believes that it is going to be unable to do its job based upon the rate of server resources (RAM and disk space) it is consuming. Exchange then attempts to protect the core. The thing it generally does first is shutdown email flow into and out of the Exchange server. This begs the questions, "how do you know if emails you sent are not being delivered? How do you know if emails sent do you are not being received? Paladin knows. Since CSI actively watches the alert consoles and doesn't solely rely on automated alerting to our clients, we went "old school" and picked up the phone talked to the appropriate person who didn't know they were not getting
emails. We worked with them to resolve the issue. A simple resource allocation change of their virtual environment and a quick reboot and those 700 users continue to do what they do without worrying about, "why is email down?”
When we combine Paladin Monitoring with Paladin Email Defense, we can do one better. Paladin Email Defense provides us 24x7x365 SMS text alerts when mail flow into and out of an email server stops and starts. If the outage is due to true disaster situation, Paladin Email Defense immediately switches into a disaster recovery mode where the clients inbound email that cannot be delivered to their mail server is immediately available via the web. The mail server may be dead or the building destroyed, but if you can find internet access somewhere, you still are able to send and receive critical emails until whatever bad happened is resolved. If the situation is temporary, Paladin Email Defense will just restart the inbound and outbound mail flow automatically as soon as the connection is re-established and then notify everyone via SMS that normal mail flow is working again.
The second Exchange event happened an hour after the first event. Unfortunately an Exchange server which provides access to approximately 1,350 users had a high CPU condition. This was causing degraded user performance. There was no warning. One minute it was normal. The next minute it was in a bad place. Paladin alerted us. We were already looking into the outage when the phone rang from the customer reporting strange performance issues in Exchange. In this instance we couldn't prevent a degradation in performance. No one can do that all the time. However, we knew before our client knew that there was an urgent issue. We were proactively working to resolve the issue as fast as possible to minimize downtime. About 20 minutes after the event started we had it resolved and everyone went back to work. Our response time from alert to action on this critical alert was about three minutes.
It is impossible to know everything that is going on, or about to go on, with your network. By overlaying 24x7x365 Paladin remote monitoring we can provide you with the ability to know things about your network that are impossible to know on your own. By overlaying Paladin Email Defense we can provide an added layer of disaster recovery protection for your critical email communications. How do you know what you don't know about your network?
CSI’s Paladin Monitoring Saves Another Client From Excessive Downtime
CSI's Paladin Remote Monitoring solution had an impressive save in the last couple of days.
Last week we had an ISP go on-site, after hours to do a routine hardware upgrade/swap. The outage was planned and expected. It was to be a quick in and out and back on-line. Paladin saw the client site go off-line (as planned). However, the site never came back. Time went by and it still never came back. Hours went by. It was obvious that something went horribly wrong. If this continued until morning, bad things were going to happen for our client. There were 2,100 users sitting behind this one connection - many of whom would be quite angry if this wasn't resolved. We placed the appropriate after hours calls to the appropriate people and around 10:45pm the ISP re-visited the client and quickly resolved the connectivity issues created by their upgrades. The end users never even knew the outage had occurred. The folks in charge of that site knew because Paladin was monitoring that site 24x7x365 whether they were standing there or not. We knew not just to rely on an automated "you are down" alert because we try very hard to have interactive discussions with our clients and go the extra mile in trying to keep them healthy. In this case it was some after hours, live "person" monitoring - just to make sure that everything came out okay. It is impossible to know everything that is going on, or about to go on, with your network.
By overlaying 24x7x365 Paladin remote monitoring we can provide you with the ability to know things about your network that are impossible to know on your own. There is simply too much data to sift through. In both these instances we were able to uncover substantial issues and deal with them before they became a major crisis with lots of unhappy users.
How do you know what you don't know about your network?
CSI Monitors Our Client’s Networks Through Hurricane Irene
As Hurricane Irene approached New York, CSI used our 24x7x365 Paladin Monitoring service to help our clients prepare their computers and networks for the impending hurricane. We were able to quickly identify all the uninterruptible power supplies (aka batteries) under management which had bad batteries or other hardware issues. Equipment plugged into these battery units had a greater than normal exposure to power fluxuations.
One client site was intending to shut down their entire operations during the storm. Before they shut their equipment down we identified a server that was compromised with bad drives in a RAID array and other hardware issues. Our concern was that since this critical server already had a failed redundant component plus other issues, it might be shut off and never come back online.
Realizing that time was of the essence in repairing the server, we were able to use Paladin's remote management tools to remotely reach into the server at 12am Saturday as the storm approached to rebuild the redundant drive and re-establish full redundancy before the server was actually shutdown. The client never had to get out of bed. No one had to show up to let us into the building, turn off the alarm and unlock the multiple doors required to get to the server closet. After our work was completed, the server went down as the client had planned and came up fine after the storm.
During the storm we pro-actively monitored our customer’s networks and provided active status updates as we saw buildings and servers go down due to power failures throughout the region. By looking at the previous alerts and querying the power supplies we were able to identify the difference between "no power" and actual equipment failures.
Once the storm subsided on Sunday we were able to pinpoint exactly what buildings were offline around the region. Then as those buildings came back on-line we were able to pinpoint exactly what equipment inside each building did not turn back on. From there we had a list of devices for either the client's technical staff or CSI's staff to investigate.
Sunday night I was personally watching over our client's networks via the Paladin monitoring console. In the middle of that I lost power at home. I walked outside and started up my generator. I then turned on my Verizon wireless card on my laptop and didn't miss a beat.
CSI's office has an ample standby generator of its own and an excellent internet connection so our 24x7x365 monitoring continued regardless of the storm conditions.
Once Monday morning came despite the flooding, road closures and massive power outages in some areas most of our clients went back to work with their computer networks operating much like they did on Friday when they left for the weekend.
That is what CSI's Paladin Monitoring does 24x7x365.