Single Point of Failure

One of my recent clients was desiging their data center and during the process of doing so, the CEO had read one of my blogs and called me in to look things over.

It is sometimes awkward for me to go into someone else’s shop and poke around as the relationship between their staff and I, sometimes is seems tenuous, if not adversarial.  Not on my part mind you; I am there to help.  “I fix things,” that is what I do.  I do ask a lot of “why questions.” which I think starts the ball rolling.

1d3c57278e64cbc3_FF_501_1432.previewgordon-ramsay-32093

The resident staff however is the expert that their boss trusts, so they “the CEO” go along blissfully ignorant of their exposures because he or she “the CEO” is not an IT person.  The resident IT people usually don’t encourage an outside person to come in “such as myself,” because of two reasons.  I think first and foremost, it is an ego thing.  They are the best at what they do and if you don’t believe them, just ask them.  Secondly, they don’t want their boss to find out just how bad things are. Or possibly their staff has no clue just how tenuous the situation really is.

wrong4wires_A_Server_Room_Nightmaretitanic_sinking

If you see any of this this in your data center or wiring closets , you need some help.

When I was working as a Manager, or Director of same, I always welcomed this kind of assistance and frequently brought it in as often times we get mired in the woods; and cannot see the forest for the trees.  I know that this is oversimplified but as a manager, I always tried to hire people as smart, if not smarter than myself.  As a manager, one gets tied up in the day to day business of the company, as well as the technology; and often times do not see what is in front of them, or, can even loos their objectivity.

If you decide to bring in someone like myself, your staff needs a heads up.  They need to be told to make themselves available.  They need to know that this person is there because you requested it, are paying for it, and expect them to work with this person as needed. This understanding up front, saves you money.

When planning a move to a new building; one can go in and design and install everything correctly the first time, much more cost effectively than going back in and re-do things later.

right 1

Notice the nice manner in how the cables are dressed and notice the Velcro cable ties on not “zip ties.”

Why not Zip Ties you asked?

fire-damage

Poor cable management and “zip ties” lead to this.  This could cost you your entire company.  

This is the charred remains of someone’s data center.

Having said that, during a data center design one of the things that is often overlooked are single points of failures (SPOF).  When designing a data center, one builds redundancy into their infrastructure as to avoid a downtime, or a total shut down.  Few people truly understand soft dollars; and how the loss of productivity effects the bottom line.

vdi10

Most everyone knows of raid, redundant switches, vmware, the cloud,  and this is about where it stops.

vmware-logoimage-cloud-computing

What about an alternate path to get data in and out of your building?  What about an alternate path or source for power?

One of my clients, (a law firm) lost power due to construction in the street below.  The UPS only held their servers up for 30 minutes or so and they had no UPS’s on workstations.  Guess what, the courts do not care if you can or can’t get to your files, your problem, not theirs! If you cannot feasibly get power to your data center from an alternate location, you have a generator or a warm site.

Backup_Generator

In their case the power took several days to get back online.  The elevators of course were not working so they had to physically remove and re-install their servers in another location trucking them down the emergency stairwell.

Failure to plan is planning to fail!

The same is true of your data lines.  Generally one installs two trunks diametrically opposite of the other on either corners or sides of the building.  If that is not feasible you look at RF links.  Yes they are slower than fiber but slower is better than nothing.

Your individual needs will dictate your level of redundancy needed.

During the design of your data center, each and every risk must be defined and be part of the risk assessment.  During a move, a site selection team must evaluate all sorts of factors regarding your data center before the lease is signed, or the building is purchased or even built, if you are going to go that way.

GreenGeeks-Data-Center10510370-the-fm200-fire-suppression-system

Data Centers large or small all should have the basic fundamentals covered.  It is your company after all.

One of my clients had me travel around with their folks to look over prospective sites.  They were a little edgy with their people that they had doing this so hiring me to go do this with them hedged their bet.  When I arrived at the first location they were ready to sign the lease.  They were not happy when I discovered that the ceiling was loaded with asbestos.  The extra cost to get someone that was certified to run cable in such an environment was over the top, not to mention that it was in an earthquake prone part of California.  There is a reason that the lease was “cheap.”  On another site the cable plant had been added to as the previous tenant grew.  They has spliced wires in the ceiling, which you just don’t do.  Some of these had been spliced to cat 3 wires.  I can well imagine what the data throughput looked like and the error rate that those persons suffered.

We start with the basics: power, voice/data, air handling.  We look at the hardware required for all of this. We calculate the power requirements and the amount of air-conditioning needed.  What about a redundant air conditioner?  How about fire suppression?  I like to install dust filtration systems, as this investment will extend the life of your equipment.  How about remote monitoring of your data center. What about security both physical and well as data?

SML-CHASSIS-4

We look at risk factors, where is it geographically?  What historic data can we find on that area regarding disasters?  I also look at permit cost, union involvement cost etc. I work with the architect and GC to make certain that all of the I’s are dotted and T’s crossed.

In 35 years of doing this I have never had a “good surprise” and my goal is to keep the customer from having a bad one.

One thing that I recommend is a sensor that “sniffs for smoke.”  This sensor triggers an emergency shut down of the UPS if smoke is detected.  Why on earth would I want to do this?

detection-smoke1

“Where there is smoke there is fire.”

Maybe, maybe not however; this equipment runs 24X7 and 2/3 of that time is unattended.  Generally, poor cable management which I see in more data centers than not, is the cause for smoke and fire.  This sensor shutting the power off to the data center stops the fire in its tracks as it generally will start smoking before actually catching fire.  This sensor hooks up to the big red button and serves as the emergency shut down if smoke is detected.  It can also be wired to the building security systems and  trigger a call to the fire department as well as sound alarms so people can get out of the building.

A lot of data centers use the cheapest fire suppression techniques out there “again designed by a neophyte.”  So, a cable rubs raw, starts smoking, catches on fire and the suppression system is activated. Water… Water and electronics do not mix.  You now have your entire data center ruined because your “people” were lazy and failed to properly dress the cables and you did not have a modern fire suppression system.  The smoke sniffer is the next best thing.  Stops the fire before the water starts.

fire-suppression-systems

The devil is in the details, and brother there are tons of details.  As part of a DR; we cover as many as we can find. Once we think we have it, we hire an outside technical staff to re-create your data center in an offsite location, with your run-book, documentation and backups.  We give them everything that we “think” they should need and then we sit back and take notes.  You’re CIO and I watch, and take notes.  We get them the answers that they need, note them, and then move on until it either works or fails.  If it fails, we do a root cause analysis, take corrective action and try it again.

Sungard

Yes, this is a shameless plug for Sungard. 

DR plans generally do not work the first time.  There are way too many details to catch them all but, as they do their thing and they hit a stopping point, we find the answer, make notes and then let them proceed until they get everything working as we expect it to be.  Then we have a Skeleton crew come in and try to work. We make notes of everything that they run into, fix what we can through the rent a geeks, and press on.

Part of the process is to determine what an acceptable down time is and that can even be broken down into individual systems.  “Payroll before sales?”

After that, we go back and alter the run book, provide more software or whatever was needed and then we get another team of rent geeks and do it again.  Same process until we get it to such a point that the rent a geek can draw on their own knowledge to fill in any holes.

An inventory of the skill set necessary for the geeks is yet another bit on information to document.

Having a disaster recovery plan, like a data backup strategy; is only as good as your last successful test.  Once you have a successful test, it is incumbent upon your people to devise a scheme to keep it updated. I do this through a process called change management.

Is this your disaster recovery plan?

dilbert

We use rent a geeks as we must assume that your staff has all been part of the disaster.  If we can make it work with technical people that you can hire from your VAR then, if there is a disaster and your staff was not involved in the disaster, your chances of a successful recovery are very high.

If you look at my other blogs regarding information technology, there is plenty of good advice out there.  If you are in need of a DR plan or someone to assist your technical team get through a migration, a move or what have you; I have over 35 years of experience, and a valid passport. I speak English with a smattering of Geek.. 🙂

Big Bang Theory cast w logo

Yes, if these were real people and not actors, I could converse with them easily, I actually know what a Higgs-Bosson is

!higgs-simulation-3

Artist idea of what it would look like.

If you are moving and need a data center designed in the new location that is probably one of the least expensive things that I do; assuming that your documentation is up to snuff.  The size of your company and time frame will determine the scope of work (SOW,) and if I will need more than myself to accomplish the task. I work with some fine people here in the DFW area, and they too can travel.  If you have a relation with a VAR already, I can work with them.  I must admit that I will evaluate them on your behalf, as I do not believe in wasting your money or my time.

My character closely matches this guy although, I am easier going and my vernacular is well suited for all people, most of the time.

Gordon_Ramsay

There is plenty of good advice here on my site, so feel free to peruse and glean what you may.  I do reserve all rights to the information contained herein, so please do not copy or disseminate without permission.  Thanks!

-Best

c All rights reserved 2014

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s