For When You Can't Have The Real Thing
[ start | index | login ]
start > Network Documentation

Network Documentation

Created by dave. Last edited by dave, 14 years and 283 days ago. Viewed 4,475 times. #2
[diff] [history] [edit] [rdf]

Dave's Tao of Network Documentation

This is a page which includes my thoughts on documenting a live, running network. It will be perpetually under construction as my thoughts change depending on what problems I've recently had to deal with.

It's also kind of a mess right now.

Principle #1: It Is Never Done

Documentation is an on-going effort that will always lag behind what is in production. Changes are made ad-hoc, things moved around or discontinued or put into service at random. Documentation will never catch up.

You have to sell the people paying the bills on the value of spending time (and therefore, money) on keeping the running documentation up to date. Frequently those conversations go like this: "remember when I had to spend $TIME figuring out how $THING was broken? Well when I was finished, there was this tech note detailing $THING, so that the next guy to come along won't have to figure it all out."

You have to do it, even though you will never finish.

Principle #2: The Only Thing Worse Than No Documentation Is Wrong Documentation

This is more of a truism than a principle. Documentation can lull you into the false sense that something is in a known state and that if something goes wrong you can therefore have a running start at fixing it.

It is important to acknowledge this problem.

Principle #3: You Are Writing Documentation For Your Successor

Odds are 95% of anything you do document you will never have to refer to again. Documentation is a collection of wisdom for the future, not for you. So you have to assume that your audience knows little or nothing about the specifics of how things are the way they are.

And there will be a successor. I don't know about you, but I don't plan to be in these specific environments for the rest of my life. Opportunities come and go, and when they come, sometimes you go. But life goes on behind you, and the smoother you can make life for your successor the better. Otherwise you might have a collection of former customers who quietly say unflattering things about you. I like to say that it is the same 50 guys working everywhere in IT in Ottawa because you keep running into them everywhere. Helping your successor might open doors for you in the future.

Now to a certain extent there is always a degree of "blame the previous guy" when trouble comes up. That is part of the business. I've done it myself. But on several occasions when I had blasted the previous guy as some kind of moron, I have learned otherwise that he really had his act together and knew more about what was going on than I did at the time.

Principle #4: "Why" is often more important than "How"

When looking at a system most of us start thinking thinks like why the hell is this like this? There are almost always very specific reasons for the configuration choices made. In these circumstances, the "Why" dictates the "How", and you have to make sure that the reader understands the specific problems being solved when examining the smoking remains of your solution.

Principle #5: It has to be easy or you won't do it

This means you have to be very aware of your tools as well as those who are going to use your tools.

Keeping things up-to-date has to be easy. If you have to make any kind of effort, then you will find excuses to avoid doing it when it is best done, which is immediately after a change.

If your tools are not easy for others to use, then they won't use it. This can be especially crippling in a team environment, since the larger the team gets the more likely you will encounter a team member who does not like your choice of tools.

Personally, I like a wiki for documents. However the problem is that a wiki does not force a structure on you, so the structure must be imposed from outside. This always leads to conflict somewhere as somebody else has a better/different idea.

In some places I've used Word and Visio documents "published" to PDF, with the "latest" PDF being considered authoritative. This is good in that you then have a collection that you can hand to your employer/successor. The PDFs, if properly dated, can provide a historical record of what happened, although one which is not easy to navigate through. It is bad in that I don't like Word or Visio and have been forced to get a basic understanding of these tools in order to effectively communicate the ideas.

My current employer is toying with the idea of Word documents in a Sharepoint portal. We'll just have to see how far we get there.

Building Blocks Of Network Documentation

Here I want to discuss more of the specific "how" to go about documenting things.

Users Care About Services, Not Computers

Your user community does not care about your ex-shv-01 computer. Your user community cares that their mail is available.

Therefore, I like to organize my documentation by service instead of by computer.

This can be organized into service groups. For example "mail" is a group which includes such services as internet MX and redundant MX, anti-spam, delivery, mailboxes, and outbound mail. Your users care about all of these but in different ways, and each should be documented as a separate entity.

You will have some services where the immediate "user" is the IT department. Take for example backups, although special notes about backup requirements almost always have to be included in the service documentation too.

Who is usually more important than How

Every service should have an owner. Ideally more than one owner. That owner is responsible for knowing about the service and keeping the configuration and documentation up-to-date.

This way when something breaks you have a starting point of people to ask. Those people will know the How, but more importantly will know the Why as well.

Diagrams should be simple, nor not included at all.

One of the reasons I don't like Visio is that it is too easy to go template-happy. Many times I have been sent diagrams that have big, colorful diagrams which include pictures of individual types of switches or computers.

I find that most of the time the pictures do not add any value to the diagram, and in fact can make the diagram larger than it needs to be. For me, 99% of my diagrams are going to have four components:

  • clouds (for networks)
  • boxes (for computers and routers)
  • text (describing the elements)
  • and lines (connecting all of the above).
My diagrams nearly always black and white and utterly boring. But the information I want to convey is there.

The second thing to think about is that a diagram should try to convey no more than one thing.

The perfect example is when dealing with an environment with more than one VLAN in it. There is a temptation to try to mash the layer two information together with layer three, and the result is almost always a mish-mash that conveys neither information effectively. I always do a layer two and a layer three diagram separately.

One objection to this simplicity is that your diagrams are not pretty! Well there is a difference between a diagram that is going in a document and a diagram which is going to be projected on the wall. I am talking about the former.

A Label-Maker Is Your Best Friend

When a computer gets deployed, it should get a label. These labels should be legible, have standard information on them, and be in a standard location.

At some point somebody is going to be grubbing through the hardware looking for a specific box that is broken or otherwise needs hands-on help. Being able to visually identify the system potentially makes that job much easier. Or at least it reduces the risk that the wrong computer will get messed with, which inevitably makes more work for that person, usually at a time when more work is most emphatically not needed.

My systems get a name and an IP address. If the computer gets re-named or re-IP'd, it gets a new label. That label always goes on the front of the computer, usually on the optical drive or equivalent blank. Co-workers who deploy computers without a label get yelled at.

Think Carefully About Naming Conventions

Naming conventions are hard. It would be possible to write an entire essay on the subject.

When it comes to naming conventions, it is important to think about what you are naming.

If you are naming something that customers are going to see, it either has to be very friendly (if customers are going to interact with it by name), or very obscure (if you have multiple customers and don't want to expose any information through the names).

If you are naming something for internal use, I like to be very specific.

These are the conventions that I use:

  • Naming conventions are not for inventory tracking or asset control. This means that tracking the serial number, hardware type, and assorted warrantee contracts is a different problem.
  • Server computers get names that mean nothing. Names that do not describe their hardware type, function, or owner. Names like saturn or peanut or homer depending on the local convention.
  • Server computers also get aliases which specifically describe what they do. Names like ex-01-ott or mx01 or fs03 or www. The reason for this is so that when the service gets moved (and it will) the alias can be moved to the new home.
  • If a server gets retired and then redeployed in a new configuration, it gets a new name and a new collection of aliases.
  • Switches, routers, and firewalls get very specific named depending on what they are and where they are. So for example if I have two 3Com 3900 switches in tower two, the second one will be 3c39t2-02. My Netscreen-25 firewalls will be ns25m and ns25s (master and slave, respectively).

Starting Places

As always, the task of documentation is usually too big to see from the start. This is how I like to proceed:

Logical Network Diagram: this is a diagram of how all the subnets you care about are glued together, including routing devices. The diagram has very little other detail, each subnet is a cloud with an IP space and maybe two words describing it, and connections to routing devices.

Physical Network Diagram: this is a diagram of how the layer two gear is connected together. This is usually mandatory in any non-trivial collection of switching gear. Switches and routers are boxes on this diagram, each with a name (which matches the label on the actual switch), a location (room, rack) and maybe two words describing it.

Server Inventory: you probably have, or can trivially accumulate, a list of servers you are responsible for.

Service Inventory: you can also trivially accumulate a list of services your user community is expecting you to provide. You can also do scans like nmap port scans of the servers you are responsible for to see what services are being provided. From there you can add software involved, configurations (if brief) or where to find them (if more complex or dynamic), support contracts, histories, etc. Each service should get a document, page, or section in whatever you are using to keep track of this information.

Backup Status and Procedures: this is a list of what is being backed up. Every six months I like to go to the backup server and make a list of what is being backed up, and then compare that list with what I think should (or should no longer) be backed up. In situations where I am a contract or non-management employee, I like to have my boss review this list with me, to ensure that we both agree on what the backup requirements are.

no comments | post comment
This is a collection of techical information, much of it learned the hard way. Consider it a lab book or a /info directory. I doubt much of it will be of use to anyone else.

Useful: | Copyright 2000-2002 Matthias L. Jugel and Stephan J. Schmidt