Tag: disaster recovery

When is the right time to think about Disaster Recovery?

When is the right time to think about Disaster Recovery?

 

Spring rains bring on more than just flowers or in my case, weeds.  The phone started ringing early the other morning.  My coffee was still brewing when the continuous ring of the phone demanded me instead of the regular answering service.

It would seem that lightning hit a pole close to one of my clients.

Lightning is far from respectful of your deadlines or the amount of work that your staff has lined up to accomplish.  From simple power outages to fire, lightning all by itself is a disaster in the making.  Some simple steps ahead of time can keep your company from being a victim to what this client was.

One girl had her headset in when the lightning struck and was shocked. Happily, she is ok, but their systems were not so fortunate.  Had the grounding been worse; she may have been the path to ground.

Once the power was restored the server, router, and switch, did not recover.

The one machine on a UPS died as the power went out.

What went wrong?

Surge protectors have a finite lifetime.  People buy these power strips with surge protectors and forget about them.  Surge protectors are nothing more than a power strip with something in them known as a “Metal Oxide Varistor or MOV.”

Any power surge above an acceptable voltage is clamped or shorted to ground by this device.  The problem is the MOV only last so long before it no longer functions.  Every time there is any spike in the line from compressors shutting off to other electronic “noise” these components are adversely affected.

What is better?”

A UPS of enough wattage to allow the computer to be safely powered down in the event of a power failure.  Along with the backup power ability, these devices have more sophisticated line conditioning circuitry protecting your equipment from stray voltage spikes.

One note to remember, these too only last so long before they must at least be maintained, or replaced.  Any CIO worth his salt is familiar with Hardware asset management and has this is mind for his budget.  CEO’s hate surprises like unexpected expenses.  It is much easier to argue a budgeted expense than going hat in hand begging forgiveness for your ineptitude.

Suffer a catastrophe like this client, hope your boss does not hire someone like me to do a root cause analysis.

At the very least batteries must be changed out but keep in mind that an MOV is also part of that piece of hardware.  I would budget the replacement of a UPS, rather than just the batteries if it were me.

Unless you have electrical engineers on staff, who are qualified to re-certify that equipment, it is too cheap not just to replace it.

 

Along with outdated hardware or not enough of it, I have seen too many times the ground plug defeated to save a dollar from an electrician.   Those ground plugs are there for your protection, not because someone wanted to make it difficult for you.  The problem with temporary is all too often it becomes permanent.

Lightning struck outside one of my client’s offices hitting a pine tree.  Finding the electrical ground for the building, which was poorly grounded, everything in the building suffered a power surge knocking out much of their equipment.

Many times, building management will only do what is necessary by code and leave the gamble up to you the tenant.

Depending upon your location, achieving a good ground could be difficult.  The type of soil must is taken into account among other things. Again, depending upon your location, you might want to invest in grounding your building with lightning protection equipment including lightning rods or now they call them “air terminals.”  The idea is to have some amount of confidence that if lightning hits, it will strike your planned target and be dissipated safely into the earth.

Since all computer equipment and now phones are wired through the network, this last customer lost computers and phones along with the network infrastructure.

Failure to plan is planning to fail.

The cost of the hardware and time to repair was minimal, compared to the amount of time the company was out of business.

Insurance will only get you so far.  As these spring storms fire up, there is a real element of danger to your building, business and, like the one young lady found out, to her person.  Had proper grounding been utilized I doubt the girl would have felt the shock in her ears.

While a tested, reliable disaster recovery plan will allow you to sleep at night, preventing the disaster in the first place is what you should shoot for.  That starts with planning.

From your building security to network security, right down to protecting your infrastructure from mother nature, accounting for every contingency is paramount.

Truth told, there are seldom good surprises in business.  Mitigating the surprises with proper planning can prevent poor performance.  Asking “what if” is key to any plan.  Weighing cost vs. probability allows anyone with some business acumen to make sound decisions without breaking the bank. Understanding the risks, are the starting point.

 

-Best

 

Attention #CEO #CFO #President #CIO and #hr

Attention CEO CFO President CIO and of course HR

Here is some food for thought for you who own or control or have vested interest in corporations.

If you were to go to your CIO or your IS manager and ask the following; what would their response be?

  • Can you show me the network map?
  • Can you show me the documentation on the V-LANS?
  • Can you give me an accurate inventory of the servers that we have including their age and configuration?
  • Can you tell me what is on each server or device and what it does?
  • Who has access to what on each server and who decides what that access is?
  • Can you tell me how they are connected to the network, is there a redundant path?
  • Can you produce an inventory of what software is on each server?
  • Can you show me the recent log files of each server and tell me about what concerns you have regarding what those log files say?
  • Where is the actual software that is on the servers and where are the license keys?

No Excuses!

You would be surprised how many Sysadmins tell me that they don’t keep software, they just download it when they need it.  Really, you have just had a disaster and your internet is down and will not be up for at least 72 hours, now what?  Not only does it make sense to have the disk for this reason but it takes time (valuable time) to go and find and download software.  They have argued that it is not the most current on the disk.  Why not?  Why have you not updated your Software Library?  There is a lot to being a Sysadmin, (SA) it is not about sitting on your butt in your office surfing the web, reading the news and updating Facebook while being annoyed by the occasional request for a password reset! Old software that is a few versions behind the curve is still better than none!  Even if you “don’t have time” to keep your library updated; something is better than nothing.

Speaking of passwords, most companies really need a security officer and really don’t understand why.  I have seen some Sysadmins that are so lazy that they assign passwords to people and then keep an excel list of them on the server.  These are not really Sysadmins because that is genuinely stupid. To open the company to so many different kinds of fraud, industrial espionage, and other forms of abuse of the system; just because the guy does not want to be bothered with password resets is incredible.  This guy would not be working for me as there is no excuse for this!  I don’t care how “nice a guy he is.”  Laziness and stupidity are a bad combination for a Sysadmin to have.

  • What software revision level are we at and is it the most recent? If not, why not?
  • Are Firmware rev levels kept up with and checked regularly?
  • Are the drivers up to date?
  • Can you produce a list of the passwords for each server?
  • What are the power requirements for these servers?
  • What are the cooling requirements for the equipment and are there any issues?
  • How long can we run if there is a power outage?
  • When is the last time that the batteries were changed out in the UPS’s?
  • Is each and every device in the server room labeled?
  • Is all networking cable installed in a manner that not only makes sense but looks like it belongs there vs. haphazardly plugged in on the run?
  • Can you show me a map of the switches, what port is doing what?
  • Tell me about load leveling.
  • Have all of the intelligent devices SNMP passwords been changed from the default?
  • If so, what are the passwords? If not, why not?
  • Are their traps being sent to a syslog server?
  • Who reads the logs, how often; and are there any concerns?
  • How are the concerns addressed?
  • Show me the notes from change control or change management meetings?
  • Are these notes managed in a responsible manner and are all changes noted in the living document?
  • What is the average age of the workstation on the floor/building?
  • Describe the policy regarding passwords? How often are they changed?
  •  Describe your Hardware asset management strategy?
  • Describe your Software asset management strategy?
  • Who handles the maintenance on the HVAC in the server room?
  • When was the HVAC last serviced?
  • Tell me about your fire suppression.

It has been my experience as an IT manager and a Disaster Recovery Specialist who does many audits; the majority of Sysadmins do a horrible job of Hardware and software management much to the loss of the company and chagrin of the CFO.

Desktops last about 5 years, Laptops 3.  When they are put into service a clock should start running to replace it in X years.  You don’t want employees working on outdated equipment and you don’t want to install new software on old computers as the license may very well die with the computer.

I have seen too many companies try to get everything they can out of a box.  Amortize the box and when the IRS says it is dead, let it go.  If there is a use for it in some non-critical function, “user discretion,” but add no more software and remove it from critical areas.

I have seen many people struggling along on a machine that is well past its usable life.  Loosing files or data or waiting around for the machine to catch up cost money.  While it may be soft dollars those soft dollars turn into real dollars quickly if you loose enough data and or time.

I used to install older computers in the break room with internet access and the usual windows Facebook type games.   Employees could use them for their private needs before or after their shift or while on break or lunch, and they were non-critical and on their own V-Lan where company data could not be accessed!

Not everyone in the company needs a full version of Office?  A lot of companies have a standard load for all computers.  That should be re-visited as it is wasteful. While  Microsoft would like you to purchase everything for every computer that  is simply laziness and wasteful.

Software and Hardware management is in itself a job and proper management of it will produce and ROI.  This is necessary also to produce a budget requirement which the CFO might cringe when he or she sees the request but, at least it is planned and not a surprise!

  • What antivirus software is on them? How did you decide on that software?
  • Are the workstations locked down?
  • Do any users have admin rights? If so, why?
  • Are the USB ports locked down?
  • Are the CD burners locked down?
  • What ports are allowed through the firewall?
  • Is the firewall updated to the latest software?
  • Are traps from the firewall being sent to a syslog server?
  • Who has access to their workstation PC from home? Why?
  • Who has access to their home PC from work? Why?
  • What software is on each workstation?

I run an inventory program like SpiceWorks or some other commercially available software, to obtain an inventory of all of the software on all of the boxes and then go through the task of identifying each executable.  I have found numerous Trojans and viruses, remote control software, games galore, software that was not licensed and oh yes, software that they used and did not know that they had as it was installed by previous regimes.  This type of activity is mandatory if you want to recover in the case of a disaster.  It is also mandatory if you want to be licensed properly and not have your neck on the line if some employee gets upsets and calls the software police.

Recently the SBA has been advertising a lot trying to get employees to snitch on their company. The rewards to the snitch are inconsequential as the penalties and fines to the company are enormous.  Having that inventory and those licenses and even receipt in a safe place I would think be a really good idea.

Some companies are so cheap that they use free anti-virus software which is not worth what you paid for it.  I fight viruses daily.  Free is not an option.  If you think that it is, you are diluted and clearly don’t know what you are doing.

Free software by definition cannot be maintained as well as commercial software.  Who in the hell has money to pay for programmers and security experts and than give the product away?!

Good Anti-Virus software is Patriotic

I made the argument the other night at a speaking engagement that it is actually patriotic to use good anti-virus software. Why?  If millions of computers are taken over at the drop of a hat by some “bad guys” and they target let’s say the FAA or the FEDS, or some other institution and are able to cripple the banking industry, or what have you, and your computer is part of the problem; what then.  A Trojan could be sitting on your computer unknown to you, just waiting for the instruction to start a DOS attack.  Stop being cheap and buy the damned software and protect your computer(s) from being controlled by “evil.”

If a government had more than two neurons firing in their collective heads, they would create a “government approved” anti virus software and give it to its citizens.  Now I know how that would be received by most, if I had a choice I would buy my own as I really don’t want anything big brother has to offer on my computer, but lets face facts.  You probably have things on your computer right now made by the Russian Mafia or worse!   I am certain that a government grant could be created to support a group of “white hat hackers” to help keep America Safe from cyber terrorism. If you do this remember whose idea it was… 

Here are a few more questions for you CIO, /owner types who might actually have some skin in the game.

  • Do you have licenses for that software?
  • Where is that software?
  • Where are the licenses kept?
  • Can we prove that we bought a license for each and every piece of software in the building? If so, do it.  If not, why not?
  • How many employees use laptops?
  • Are they secure?
  • Are they encrypted?
  • Are USB drives or thumb drives that are necessary for business use, encrypted?
  • Do the laptops have up-to-date anti-virus software on them?
  • How old are they?
  • Do they use a VPN to get into the servers from outside of the office?
  • How secure is their VPN? What challenges, if any are there?
  • Do you use security tokens?
  • Can you show me a map of the building depicting which PC is hooked up to which drop?
  • If you are using VOIP can you show me that same map for the phones?
  • Is the map updated as changes occur?
  • Describe your backup policies and procedures.
  • Where is the data being sent off site?
  • Are we using the cloud for backup?
  • Walk me through the procedure of getting access to the data if this building is blown away.
  • Walk me through the procedure of restoring the servers in another location.
  • Tell me who can do this if the Sysadmin is not available?
  • Have we tested a restore of the data, if so when was the last test and where are the results; if not, why not?

These few questions and comments are off the top of my head and it took about ten minutes to list them.  There are plenty more but, this gives you a small flavor of the kinds of information you should already have and that I gather in a disaster recovery project.

The simple facts are that IT people are loath to document anything.  It is kind of like editing your own work, you know what you meant to say and your mind fills in the blanks.  Documentation should be written in such a way that a technical person not familiar with your company should be able to pick up the document and pieces and re-build your company without you there.

Often I am met with complete truculence and arrogance and lots of attitude by the IT staff of a company that I do a DR for. They don’t want me there as they don’t want me messing around in their sandbox.  Truth be told they don’t want the the facts that they are remiss in their jobs to get to their boss who thinks everything is running perfectly, until it isn’t!

About Me:

If you happen to watch or ever have watched Hells Kitchen, or Kitchen Nightmare, or know who Chef Ramsay is than, you have a clue of who I am, without the foul mouth.  I take IT departments and fix them, and I take no prisoners (no excuses).  Not only do I fix the hardware and software components, but I fix the personnel issues as well. It may be a training issue or an employee that is a poor fit. It may be a lack of people as most companies try to run too thin on staff. There should be no one person who is sacrosanct.  In a disaster you may loose them, so we need things documented in such a way that a rent-a-geek can restore your company.  If there is no documentation, I create it.  Through a test of the DR, we can then hone that documentation to a fine point.

I am a troubleshooter.   Not only am I a problem solver; I have been in management of IT for a large part of my life. I get to the bottom of issues and take corrective action.  IT is ancillary to the business.  IT is a tool that has to be running smoothly; like a Swiss watch.  Your job as CEO is to run the company, not IT.  I have built data centers from the ground up, as well as re-built them while the business kept going all over the country.

From Data, fire suppression, HVAC, power requirements, UPS requirements, floor height, easy access to the equipment, MDF and IDF design’s Data and Voice, from the east coast to the west from the north to south.  I have worked in Union areas of the country to the Wild West where “anything goes.” Been there done that.

Go ask your IT people some of these questions and see if you are satisfied.  After 30 years in this business I would be surprised if you were.

From me, or someone like me, among the deliverable s will be the documentation that so many just don’t do.  Without that documentation you are playing with galloping dominoes. Your risk might be small as you yourself know something about it, or it may be huge in that you, like most who run a company, run it from 20,000 feet, through your management.  There are seldom any pleasant surprises in business.

Has anyone at your company done a risk assessment?  Where are you located geographically?  Are you in an area that is prone to earthquakes, Hurricanes or Typhoons? How about tornadoes or fire?

One of the largest risks to a company surprisingly is none of the above.  It is employee error.   I have worked for companies where the Owners were the issue.  One company had their child who played video games work on the equipment and of course screwed it up constantly.  Stay away from those companies as they don’t want to hear the truth.  Their child is perfect, knows everything about anything so it must be the fault of the internet or the software or something else.  I worked for companies where the owners themselves who ran the company, also thought they were the end all be all of IT.  Pride comes before a fall; and believe me, when you own a company you really don’t want to have that fall.  Stick to what you know best and leave the technical things that change daily to those that keep up with it.  We who know this stuff are constantly involved with forums and our peers.  What works today may not work tomorrow.  Unless you can devote your life to this, let those of us who do, do it!

“NO”

One owner takes a passing interest in the latest greatest through a magazine and orders or asked his IT guy to make it so.  If you have a yes man working for you, do your self a favor and fire him.  Your people who do this for a living should have the ability to say no.  If they say no, you should listen to them.  If you want a second opinion call your VAR.  If those two don’t jive call another.  Bottom line is you never install REV 1.0 of anything into production, ever!  If your guy cant be honest with you, get real and hire a person who will tell you “no!”  It may save you tens of thousands of dollars, if not your company. I have had yes men working for me in the past and got rid of them.  I depend on Team Corporation and that means I need their input.  While humbling oneself to listen to an underling can be a challenge at times, they may know something that you don’t.

I once worked for a guy who ran a company selling and servicing office equipment.  This was actually my first real job out of school.  The guy was from Georgia and had been a tank commander in WWII.  His manner was gruff, but he was sincere as the day was long.  We became close over the years as I have always made it a point to look at what successful people are doing, how they got there, and basically what made them tick.

He promoted me to the position of service manager of one of his locations.  He drove me over there to introduce me to the new team and show me around.  While on the road, he told me that one secret of a successful person is to hire people smarter, or at least as smart as you were.  To me that was probably one of the most salient bits of advice that I could pass on.  That means that the man had humility and, also he must have thought something of me.

While I still struggle with humility today, I am aware of it, and work on it.

Hours of Operation.

I had a guy interview with me. Towards the end of the interview he asked me if there would be any overtime as he had obligations after work and on weekends.  This guy clearly had no clue about the job for which he was applying.  Hourly jobs are Burger King, not Sysadmin or Network specialist etc.   We get paid well because this becomes the biggest part of your life!  If you are a 9 to 5 guy, don’t look at IT as a career.

As anyone who has been in IT any time at all can attest; this is not a nine-to-five job.  One never knows when something will stop working and you are suddenly pulling an all-niter to fix something.  With VMware and the technology we have today, we can minimize that risk which is something that we do through proper configuration of the servers, building in some redundancy and keeping up with the age of our hardware.

Once you get past a twelve hour day, statistics show that you are much more error prone, thus shooting yourself in the foot; and possibly the company.  Best practice planning and implementation from the beginning mitigates this risk. Having up to date documentation as well as partnerships with VAR’s will allow you to recover faster, and employ fewer full time people.  Staff augmentation through a VAR is an excellent way to keep the number of FTE’s down but, that relationship really needs to be solid.

If you want to experience what “cold running blood is” come in late at night to update some software on the server, re-boot it and then you see the prompt, drive 0 not found.  This was before the days of raid.  This was when Ginning a server started with installing 25 5.25 inch floppies followed by a 12 hour compsurf.  We have come a long way since then and so have the folks who create viruses.  This is one of the most dynamic industries that I am aware of.  One really must be dedicated to be any good at this.

By dedicated I mean just that.  Keep up with what is going on through periodicals, peers in the industry and again I cant stress this enough at least one good VAR.

On one of my data center re-builds a vendor was doing our cable plant.  They ran long into the night and someone made a mistake.  Instead of pulling the old data lines and stopping, they cut and pulled the phone lines as well.  On another cable job that I was aware of about 3 in the morning a 32 pair conductor cable got stuck.  Instead of seeing why, the installer reared back and pulled for everything that he was worth.  He snapped an ionized water line and flooded the computer room in a huge hospital.  Water poured out of the elevator shaft like it was some sort of an elaborate fountain.  Thank goodness that was not my job.

Much like driving less than 500 miles a day on a vacation is a good idea; so is the amount of hours worked by each person, as mistakes happen. Make sure you have adequate staff to do the job, especially when you are taking on a new project.  How do you do that?  Proper project management methodologies and relationships with VARS… That is another story…

Here is an example of what a sysadmin is as defined by this site.

http://www.supportingadvancement.com/employment/job_descriptions/advancement_services/system_administrator.htm

ESSENTIAL FUNCTIONS:

The System Administrator (SA) is responsible for effective provisioning, installation/configuration, operation, and maintenance of systems hardware and software and related infrastructure. This individual participates in technical research and development to enable continuing innovation within the infrastructure. This individual ensures that system hardware, operating systems, software systems, and related procedures adhere to organizational values, enabling staff, volunteers, and Partners.

This individual will assist project teams with technical issues in the Initiation and Planning phases of our standard Project Management Methodology. These activities include the definition of needs, benefits, and technical strategy; research & development within the project life-cycle; technical analysis and design; and support of operations staff in executing, testing and rolling-out the solutions. Participation on projects is focused on smoothing the transition of projects from development staff to production staff by performing operations activities within the project life-cycle.

This individual is accountable for the following systems: Linux and Windows systems that support GIS infrastructure; Linux, Windows and Application systems that support Asset Management; Responsibilities on these systems include SA engineering and provisioning, operations and support, maintenance and research and development to ensure continual innovation.

SA Engineering and Provisioning

  1. Engineering of SA-related solutions for various project and operational needs.
  1. Install new / rebuild existing servers and configure hardware, peripherals, services, settings, directories, storage, etc. in accordance with standards and project/operational requirements.
  1. Install and configure systems such as supports GIS infrastructure applications or Asset Management applications.
  1. Develop and maintain installation and configuration procedures.
  1. Contribute to and maintain system standards.
  1. Research and recommend innovative, and where possible automated approaches for system administration tasks. Identify approaches that leverage our resources and provide economies of scale.

Operations and Support

  1. Perform daily system monitoring, verifying the integrity and availability of all hardware, server resources, systems and key processes, reviewing system and application logs, and verifying completion of scheduled jobs such as backups.
  1. Perform regular security monitoring to identify any possible intrusions.
  1. Perform daily backup operations, ensuring all required file systems and system data are successfully backed up to the appropriate media, recovery tapes or disks are created, and media is recycled and sent off site as necessary.
  1. Perform regular file archival and purge as necessary.
  1. Create, change, and delete user accounts per request.
  1. Provide Tier III/other support per request from various constituencies. Investigate and troubleshoot issues.
  1. Repair and recover from hardware or software failures. Coordinate and communicate with impacted constituencies.

Maintenance

  1. Apply OS patches and upgrades on a regular basis, and upgrade administrative tools and utilities. Configure / add new services as necessary.
  1. Upgrade and configure system software that supports GIS infrastructure applications or Asset Management applications per project or operational needs.
  1. Maintain operational, configuration, or other procedures.
  1. Perform periodic performance reporting to support capacity planning.
  1. Perform ongoing performance tuning, hardware upgrades, and resource optimization as required. Configure CPU, memory, and disk partitions as required.
  1. Maintain data center environmental and monitoring equipment.

KNOWLEDGE/SKILLS:

  1. Bachelor (4-year) degree, with a technical major, such as engineering or computer science.
  1. Systems Administration/System Engineer certification in Unix and Microsoft.
  1. Four to six years system administration experience.

COMPLEXITY/PROBLEM SOLVING:

  1. Position deals with a variety of problems and sometime has to decide which answer is best. The question/issues are typically clear and requires determination of which answer (from a few choices) is the best.

DISCRETION/LATITUDE/DECISION-MAKING:

  1. Decisions normally have a noticeable effect department-wide and company-wide, and judgment errors can typically require one to two weeks to correct or reverse.

RESPONSIBILITY/OVERSIGHT –FINANCIAL & SUPERVISORY:

  1. Functions as a lead worker doing the work similar to those in the work unit; responsibility for training, instruction, setting the work pace, and possibly evaluating performance.
  1. No budget responsibility.

COMMUNICATIONS/INTERPERSONAL CONTACTS:

  1. Interpret and/or discuss information with others, which involves terminology or concepts not familiar to many people; regularly provide advice and recommend actions involving rather complex issues. May resolve problems within established practices.
  1. Provides occasional guidance, some of which is technical.

WORKING CONDITIONS/PHYSICAL EFFORT:

  1. Responsibilities sometimes require working evenings and weekends, sometimes with little advanced notice.
  1. No regular travel required.

———————————————————————————————————

This is close but I would add to this list… I see nothing in this description about documenting anything.  Maybe that is why it is not done in so many places?  Does your SA do this type of thing?

-Best

Hubris in IT

Image

 

It would seem that “Pride cometh before a fall” is something that is lost on most people who work in IT.

 

As someone who has been working with computers from about the time Bill Gates was buying an operating system from some poor guy in Washington State, and Steve Jobs was phone phreaking; There is just not much that escapes me.

 

I was doing some consulting for a company that was simply put together with bailing wire and scotch tape.  They had a huge pipe to the internet and were getting a dribble through by the time it hit the desktop.

 

Loading WireShark (a free protocol analyzer) examining the broadcast packets it was easy to see why.  The OS was literally working with NetBIOS to route packets.

 

A quick examination of the “server room” found the switches all tied together with Fiber and, patch cords going from one switch to another causing untold amount of routing loops etc. While the picture above is a stock photo the room in question looked very much like this.

 

My job however was not to fix their networking issues as this was the task of the guy I was “helping.”  He was the System Administrator.  I sent e-mails to him alerting him to my findings so he could take the appropriate steps, which for some reason he discounted and did not do.

 Image

The weeks went on and the problems persisted several times a day where people were kicked off of the network or files were corrupt or lost etc.  His response/fix was to release and renew the IP address.  Putting one band aid on the problem day after day I guess gave him a sense of accomplishment but the problems were looming and like the 500 pound gorilla in the closet, soon to get out.

 

One of the things that I learned many years ago is to work with VARS.  Value added resellers have years of experience to draw upon.  They know which products are buggy and to stay away from and which are tried and true.  If you are a business don’t try and save money via internet stores as you will get what others can’t sell for one reason or another.  They are on sale for a reason…

 

When I asked him for his vendor contact list to include in his DR plan, there were no VARS on the list.  Everything was from internet companies or local retail locations.  He in fact had no fallback plan if the $hit hit the fan.

 Image

The hardware purchased looked as much.  There were no standards anywhere.  There were high end SANS tied to cheap switches.  The workstation of choice was whatever he got a good deal on making mass deployment of anything just about impossible.  Hardware was way past its lifecycle and the list just went on.  Because of his pride; he was not willing to listen to anyone regarding anything IT.  If he does not change it will be his undoing.

 

This is not my first rodeo and certainly not my first encounter with arrogance.  As a manager I can deal with it, as a consultant one must work around it and if it becomes too big of an impediment, bow out.  There is no reason to sully your name with a situation like this when the outcome will likely somehow be your fault.  

 

Always hire people smarter than you are and have the humility to acknowledge that you are not the end all be all.  There is simply too much information out there to know it all.  Wisdom is; knowing that you need help and to leverage VARS and consultants is simply smart.

 

-Best to you and those that you care about!

Disaster Avoidance

 

 

Consulting as a Disaster Recovery Specialist, I often find things that need to be changed to avoid a disaster, much like a loose rug over a threshold or too many things plugged into one circuit; which would be an issue in your home.  In the business world it comes down to security issues both IT related and physical, as well as simple things like a lack of fire extinguishers or the wrong type of fire retardant system in the computer room.  I am trained to notice the smallest of details including things like cable management issues. 

 

When Best Practice scenarios are not followed by sys-admins or networking guru’s, they too trigger red flags.  There is an art to designing data centers.  I have designed and built many over the last 30 years complete from the ground up; from air handling to power requirements to working with ADA compliance issues.  I have designed cable management for many companies that include the MDF and IDF’s and working with building management to handle communication through multiple story buildings making sure that they pass fire code.  You would be amazed at how many data centers that I walk into that are under wired, lack proper air handling and have a sprinkler head above the equipment!  The cable management looks like Spiderman installed it, nothing is labeled, and there is absolutely not one shred of documentation.   And the boss / owner is oblivious to the immanent disaster, as he thinks his guys are pretty good!

 

When business’s start up, often times they don’t contact the brightest and best to build it as they are on a tight budget.  When I am called, their data center is generally a candidate for one of those web sites that post “what not to do.”  The exercise of unraveling the Gordian knot comes into play before anything can be changed.  Many times a family friend is called to assist or the business owner has a home network and thinks that a business network is no different.

 

When these knots are constructed; most if not always there is limited or no documentation and the original creator has long since abandoned ship as he undoubtedly realized the ice berg ahead was not too far off.   To that end there are many land mines that have to be discovered and diffused.  This practice is akin to changing the tires on a racecar, while it is going down the track, and part of that track is in no mans land!  The catch 22 is that no business can afford down time but, if they don’t address the issues they will have un-planned down time!  Un-planned is always much longer than planned, and always more expensive!

 

As an SME on this and many subjects regarding IT, I can offer many things to mitigate any issues and put them on a road to setting things right. Whether that is working with their current IT staff, or bringing in hired guns to knock it out quickly!

 

The business must be willing to want to change, and have Executive buy-in as well as buy-in from the local staff.  The process can take weeks to months depending upon the situation; but after it is all said and done, procedures and processes are put into place to keep up with change.

 

Some policies addressed are Change Management, Incident analysis; complete with root cause analysis, documentation with the introduction of the concept of a living document. The run book, what is it and how does it work?  Testing the Disaster Recovery plan and then implementing changes from things learned. Other topics include SAM (software asset management,) and of course hardware management including lifecycle, and the budget process.   

 

All too often the CFO or CEO is told that IT needs X thousands of dollars for this, that, or the other thing; not because it is a new project but because something failed!  With proper asset management this can be mitigated greatly and things can be budgeted for.

 

Much like any other audit, I don’t guarantee anything will be pleasant other than the knowledge that when it is done you will have the documentation you need, your network will be running at peak efficiency and it will be secure.  Depending upon your growth and company needs, a design can be implemented to make sure your data network is robust enough to handle changes and or growth!

 

The last thing that I can address for you is personnel.  As a manager of and director of IT for 2 decades I know people.  I know who is right for a job and who is not.  If that type of expertise is needed; look no further.

 

-Best

Disasters Big and Small

Disasters Big and Small

As a Disaster Recovery Specialist, I walk into many companies that are one step away from disaster.  Some of them have been living on a wing and a prayer for a long time and are absolutely oblivious to the precipice on which they are perched.

One of the largest challenges one faces in this line of work are people.  By that I mean more specifically egos.  People are threatened by someone that “knows more than they do.” 

Image

Let me tell you a secret.  This is a Jack Palance type secret, (from City Slickers) “This is the one thing” that will save your keister as well as change your attitude.

 I worked for a man who owned this business that was very successful.  I was a young guy fresh out of school and this guy saw something in me that I remember to this day. As time passed he took me under his wing and helped me knock some of the rough edges off of my “perception” of the world as it was.  He took me out one day to JC Penny and had some sales clerk measure me for a suite and then he picked out a couple of them.  We went to the shirts and he purchased a few of them right down to the shoes.  While these were not super expensive, they were not cheap and his generosity never escaped me. The only thing that he did not replace were my shorts!  Some might have taken offence to this but I am no creature of fad or style and while I would not qualify for a candidate on “what not to wear,” I did know that style was not my strong suite.  “Knowing your limitations” is good advice, but not the secret.

Later he had me take over the service manager position in one of his branches which came with a company car and credit card.  This was before the tax laws changed.  He told me to use the car as I wished and if I took it on vacation to at least “pay for some of the gas myself.”  He took me over to the office which was a good drive from the Dallas office.  He regaled me with stories of advertisement and marketing.  He told me the story of the sign with the waterfall on it by downtown Dallas.  Back then it was a Pearl Beer sign.   This man was pretty close to deaf.  He was from Georgia and his accent was still very thick.  It turns out that he was a tank commander in WWII.  He told me that the secret to survival is to “surround yourself with smart people.”  That not only applies to war, but business and oh yes, life in general.  If you want to be successful, surround yourself with people smarter than yourself and learn to humble yourself.  It is only by this step of humbling yourself will you realize the advantage of being around these people.  I have never forgotten this and to this day I still practice this.

I offer this advice to all IT people in that “you are not the end all be all.”  You cannot know it all even though you think that you do.  We become focused on what interest us and then the rest of technology passes us by.  Learn to control your ego for it is your enemy.  No doubt you have heard the phrase “you are your own worst enemy.” Think of the truth of this statement and then marry it, own it and then change it.  When someone starts talking to you about something which you think you know about and you feel that “anxiousness” start to well up inside, recognize this for what it is, you’re undoing.  Squelch the feeling, take a deep breath and listen to what this person has to say.  It may be worthy of hearing or it may be total crap. Before long this will be habit and you will have trained your ego to stand down.

One of the first steps in the DR process is an AUDIT.  In order to prepare for a disaster one has to know what one has.  This is done by an audit of the technology, how it is configured and of course managed. We look at policies and procedures and just really get into your business in a big way.  The more you work with us the more you will get out of it.  Conversely the more truculent or evasive that your staff is, the more it will cost.  This is a “by the hour” service and time is money.

Audits are never fun but necessary, in that no one is perfect.  Audits uncover the “dirt” so to speak and no one wants to acknowledge that they have dirt.  Nobody wants to look bad so they are either un-helpful or become very defensive and blame the guy before them and so forth.  No one in their right mind would welcome an IRS audit because of this.  You know that you are playing by the rules but the rules are thousands of pages long.  What if?  Individuals should budget for an accountant for this reason.  Companies should have more than one accountant “even if it is a small company” in that they can check one another. (another story for another blog)

While IT audits wont land you in front of a judge, it could have an effect on the bottom line in that deficiencies could be uncovered which could end up in with un-budgeted expenditures.  Having an up to date DR and BC plan will not only prevent this but, will keep your IT department on their toes and up to date.  A fresh set of eyes looking at how things are done contrasted against your business processes and needs, often bear fruit in that there may be a better way to do things. Personally I subscribe to “best practice” methodologies and policies.

Some companies don’t take IT seriously and look at it only as a necessary evil.  An attitude which must be changed as IT is much more than a necessary Evil.  IT is a resource which ties the entire company together.  This department is the glue that binds most departments together as well as the interface between the customer and the company.  In looking at the want ads occasionally one might notice ads for IT people with the following “PC Wizard” needed.  Really?  Does this person come from over the rainbow?  The simple facts are that some HR people are totally bereft of any ability to interview for this position and the company as a whole does not take the department very seriously.  I would liken this to the “audio visual club” at school.  Know this all you who mock them, the nerds will inherit the earth. I digress..

If you really look at the way that your technical infrastructure touches every person in your company and your customers; your attitude on this matter might change.

During the process of a disaster recovery plan, this becomes very clear in that one of the pieces of this plan is a Business Impact analysis.  It is during this process that the lights turn on in the CEO’s, or CFO’s head.  I have heard the question posed to the CIO or CFO on many occasions “why hasn’t anyone told me this?” The simple facts are that the CEO’s job is to run the company, not the IT department.  He or she depends upon the CIO to look out for the company on all things IT and a DR plan is simply one small part of it.

Simple programs like asset management and S.A.M. “software asset management” are not only not in play, but not even thought of.  How can one budget for new stuff if one has no clue what one will need down the road?  A complete Asset management program should be SOP in any company.  This program accounts for hardware from the cradle to grave.

The same is true regarding software.  Often time’s, companies pay way too much for software as it is installed by policy on computers with users who will never use it.  Users may bring in their own software and install it, leaving a liability for the company to contend with should there be a software audit and it is done by the SBA.

While there are no good surprises in business there are certainly no good surprises after an event has been suffered by a company.  A fire in the data center could take the entire company out of the marketplace for good.

Image

Fire caused by poor cable management practices.

Human error accounts for a large percentage of the events which caused companies to fail.  Doing a root cause analysis on failed companies who suffered a disaster you find that they did not value such a thing as “it will never happen to me.”  You don’t have to suffer a Sandy or Katrina type event to bring your business to its knees.  A simple mistake from some employee, working for a company without a business continuity or disaster recovery plan can ruin your day, if not your career.

It is at this time many companies wish that they had spent the money on such a plan.  Too Late… If you fail to plan you plan to fail.

You can purchase insurance which will assist with the closing of the company but, that is not the way to go out of business, with a whimper, because you failed to plan.

Updated documentation of your infrastructure otherwise known as a “living document,” should also be SOP.  IT folk absolutely do not like documentation, more specifically creating it.  There are many schools of thought on this reason, but I suspect that laziness along with a “need” to have proprietary information so they are not expendable weighs somewhere in their decision.  If the latter is your reason for not doing what is right for the company you need to re-examine your life. 

If you are taking the paycheck you owe your employer the best that you can offer.  If you managers feel like you have people in your department who are not expendable you need to address this post haste!  One rule of preventing a disaster is avoiding single points of failure; and that means people as well.

Part of disaster recovery is averting disasters to begin with!  Through solid best practices in policies and procedures, a large percentage of disasters can be negated.

One last topic on the subject that comes up from time to time.  “Do I have a legal obligation to have a DR/BC plan?

The answer is not as clear cut as one would like.  The interesting thing however from a legal perspective is that there is legal precedence whereby companies were held liable for failing to provide a more error tolerant system.  They in fact were found to be negligent and case law purports to award large sums of cash to the plaintiff.  These cases not only hold the owners of the company negligent but any and all officers of the company are liable.  Think carefully about that promotion and VP title.

While companies are apathetic towards spending the money on such a plan, doing so is not only moral, it is strategic and most likely a legal obligation.  As Billions of dollars are spent annually on technology to maintain a competitive edge “standards of care” and due diligence are required of all corporations both public and private.  Not having such a plan violates the fiduciary standard of care.

-Best to you!

staylor@guard-protect.com

www.guard-protect.com

 

Big Red Button or Time to Panic!

Image

Nothing says “push me” like a big red button.  One of the office supply stores even created a big red button that says “EASY” on it, to advertise how they can simplify your work life.

One of the data centers that I was responsible for had such a button.  It was covered with a little plastic rectangular box that said “emergency shut off” on it.

I have been in many data centers during my career.  There were several that had a big red button by the door with it sole purpose to release the magnetic latch on the door, to open it.

Like any other location, security in a data center is paramount.  Not only are network security firewalls and such important but physical security as well.  Only those who needed access to the data center, could access it with their security card.  Not even the CEO had access as he did not need it.  Their entrance was logged and in fact throughout the building one could forensically track any employee’s movements as this card was necessary to gain access to just about anywhere.  With the technology available today, I could design such a better system, but that is beyond the scope of this document.

One day, a vendor was visiting with a proposed solution to a problem.  Like any other vendor, if access to the data center is required, they are escorted at all times by one of, if not more of my staff or me.  The data center was in the middle of a retrofit and redesign while keeping the company running in parallel.  (This is much like changing the tires on a race car while it is moving down the track.)  On their way out of the data center, just as quickly as anything, the sales guy in front reaches up to the left of the door pops the cover open and pushes the big red button!  By the time that the sound of “NO” had left my lips, there was an eerie quite in the room.

The chain of events that this action triggered, were phenomenal.  Lights went off, the air handling unit went off, the Battery back-ups clicked on and for the moment; it looked as though the carefully engineered back-up power supplies were working.  I should mention that the look on this guys face was priceless, and I am just about certain that he had to change his shorts afterwards.  It dawned on me that no one had tested this button, and nobody knew where all of the circuit breakers were; well almost no one.  As I was the one that specified the power requirements for this data center and oversaw the installation of the new transformer, I knew where the main breaker was.  Within moments I had most of the power back on however; there was one legacy system that was still not on main power.

In another closet in another part of the building were still more circuits for this room.  I did not have a key to this and getting building maintenance involved was time consuming as they typically think like union employees; (don’t care if the place is on fire, when it is time for a break, they take it.)  Before the UPS was totally drained for that system I had gained access to that closet and found one tripped breaker.

I had inherited a mess of a data center that was put together on a shoestring budget.  Not because the company could not afford to do it right, their boss was cheap beyond reason.  They had cut corners at every place they could, including splicing old type 3 wires to cat 5 wires and running 16mg token ring over it.  They could not understand why 5250 and 3270 traffic would constantly be garbled and why connections to the server would be dropped frequently.  When I say spliced, I literally mean wires twisted together and a wad of electrical tape stuffed in the wall and or ceiling.  (Another story)

It did not take me long to get that circuit changed over and documented with everything else.  I also got to check off the list “test emergency shut down.”

Moral of the story; if you have a big red button, find a time to test it.  Secondly make certain that the button is labeled in big white letters on a red sign etc EMERGENCY SHUT OFF!

I am a stickler for documentation, which IT personnel are loath to do.  A living document should exist within each and every company that explains the ins and outs of everything, so if need be, someone else can take over.  It is part of the audit process for a disaster recovery plan and is one of the deliverable s.

-Best to you and all those that you care about!