For When You Can't Have The Real Thing
[ start | index | login ]
start > Lessons On Longevity

Lessons On Longevity

Created by dave. Last edited by dave, 8 years and 352 days ago. Viewed 1,775 times. #3
[diff] [history] [edit] [rdf]
labels
attachments
My first job out of college was as a windows help-desk tech for a small software developer. From that position I moved to unix and network administration, and finally to configuration management before moving on. I learned many lessons while dealing with this employer/client, but today I want to touch on two of them:

Nothing Is Forever

I remember my first FTP server. I built it from the ground up. It was a SPARCstation-10 with a disk array that featured RAID-1 OS disks, and a 3-disk RAID-5 array. It had wu-ftpd installed, and I had configured it to within an edge of its life. It was the company's customer-facing FTP server, and so reliability was very important. With the exception of a disk failure on one of the OS drives, it ran like a watch.

I remember the day my boss decided to build a new FTP server, after two years of service by my wonderful FTP server. My initial reaction was one of jealousy. How dare he contemplate replacing this work of art that I had created? What, if anything, could this new contraption provide for our customers that mine did not?

This was my first brush with the lack of permanence in the business environment. As it happens, there were very good reasons for replacing the FTP server, mostly surrounding issues like hardware life, support availability, and admin overhead (we had to rebuild wu-ftpd from scratch every time a patch came out). The new server was faster, newer, had a support contract, and used package management to do updates.

Thus the lesson: nothing is forever. Whatever you make in this business will probably be temporary, and one day it will all be gone and forgotten even by you. While we shouldn't rush to throw out working and supportable systems just for the sake of having something new, neither should we cling to creaky old systems that are past their service life.

Temporary Is Longer Than You Think

After five years with this employer, I decided to move on. Eight years later my then-current employer received a call from my first employer: would we be available to consult on some of the systems that they had running?

The new parent of my first employer had been taking the team in different directions, so the group of people in that office had shrunk. At some point it was not cost-effective to keep on-site IT resources, so they moved on.

But now the new parent was digging around in the environment of the product that my employer had been building while I worked for them. Some customers were still paying for support, so the environment had to remain. But time was catching up with this environment, so we were called in to assist.

It was something else, walking into a computer room filled with gear that I had last touched eight years before. After some poking around I concluded that several important systems were still doing exactly what I had left them doing the day I left.

Looking around, I saw that several of the infrastructure pieces -- edge switches and the like -- were the units that we had put into service ten or eleven years previously.

I don't think I need to tell you that is a long time in this business.

Part of the issue the company was facing was that some pieces of the environment had stopped working and they didn't know why. I figured out quickly that someone had gone through and marked as surplus (and then removed) a few systems that the rest of the network depended on, but a little elbow grease and ingenuity transferred the dependencies to other systems which did exist.

Another part of the issue was that because this project was in deep maintenance mode, the still-existing customer base was still running on very old architectures. Solaris 2.6. HP/UX 11.0. AIX 4.x. Windows 2000 Server.

And the challenge was making sure we could back these systems up to tape.

Backups were being run on an equally ancient NetBackup 3 server, driving AIT-2 treefrogs. Backups were taking a long time, media rotation wasn't happening correctly, and there was no easy way to deal with the inevitable media failures that were starting to happen.

But here we ran into another problem: we couldn't update the robotic device, because Solaris 2.6 and the NetBackup software we had wouldn't deal with them. But newer NetBackup software wouldn't necessarily back up the old OS platforms we had, because they were all very out-of-support by their manufacturers. And Symantec wouldn't sell us the license keys necessary to add more robots because the old version of NetBackup was also end-of-life'd.

We eventually came up with a work-around for most of the issues, but the core of our backup network is still a Ultra-5 running NetBackup 3.x with the treefrog robots.

But the lesson: sometimes things last longer than you expect. If something doesn't cause pain, it probably won't get upgraded or replaced. So even though nothing lasts for ever, you have to plan and build things to last.

You also have to plan for that survival: ensuring that systems get documented so that after your office gets downsized and you are out of a job, the people left with your systems have something to guide whomever gets roped into supporting that environment.

no comments | post comment
This is a collection of techical information, much of it learned the hard way. Consider it a lab book or a /info directory. I doubt much of it will be of use to anyone else.

Useful:


snipsnap.org | Copyright 2000-2002 Matthias L. Jugel and Stephan J. Schmidt