Over Designed and Under Documented

Documentation is important. I doubt anybody would disagree with this statement. I don’t know anyone who would say they went into a new job or a new client with enough documentation to feel confident. Most of this knowledge comes from digging in and there is rarely enough time to stop and open Visio. We all want documentation in six months but it’s never today’s priority.

Documentation becomes crucial when you create a new standard or design something outside of the existing standard. In a perfect world all hardware in a datacenter will be the same and a decision from a year ago will still apply today. Unfortunately we don’t live in a perfect world. Hardware changes, different sized bladed and rack mount servers are purchased, etc. If the settings aren’t compatible or there are a different number of physical adapters purchased for one project but not another you can run into issues. Let me dive into an example:

Network Isolation

During a new job I was tasked with redesigning a datacenter. The milestone had been missed and the deadline was looming so the pressure was on. I wasn’t familiar with the project’s existing design due to a lack of notes (see my earlier discussion of the importance of documentation). The requirements I was given were “use a distributed switch and isolate NFS traffic to two dedicated VLANs”. All 30 hosts were identical blades and had 4 physical NICs.

I created a single dvSwitch, set the NIC teaming rules to meet the requirements, and all was well. A year later a few blades with only two NICs were purchased. Another technician added them to the dvSwitch. All networking checks succeeded except we couldn’t ping the NFS vmkernel port or the VM that needed to sit in that VLAN.

The Original Design

I set out to fulfill the requirement to dedicate two network adapters to NFS storage. There are quite a few ways to fulfill this requirement but there are two common solutions. You can use one switch and set explicit failover rules. All port groups except for NFS use dvUplink 1 and 2. The NFS port group use dvUplink 3 and 4.

one-dvswitch

The other decision would be to create two dvSwitches. One dedicated to NFS, a second dedicated to all other traffic. Then you simply have NFS uplinks 1 and 2 for the storage dvswitch and the a second set up uplinks 1 and 2 for the production switch.

two-dvswitch

They both work, but which way is correct? That is a discussion for your team. I prefer a single switch. That leads to fewer decisions when setting up a new host. Two switches give you less room for error with non-standard hardware but more switches to maintain.

Lessons Learned

What I should have done was change the name of the uplinks in the dvSwitch. I should have renamed uplinks 1 and 2 to Prod-Uplink1 and Prod-Uplink2. Then I should have renamed uplink 2 and 3 to NFS-Uplink1 and NFS-Uplink2. This would have made it more clear that no physical adapters were connected to the NFS uplinks.

Secondly I should have drew it out in Visio. I also should have written a networking section as part of an overall design document defending my decisions. This is how it should have looked.

correct-dvswitch

Thirty minutes of documentation with this could have saved hours of troubleshooting. This was early in my enterprise career and one of the first design decisions I ever made with a dvSwitch. I have since passed the VCAP-DCD and am now starting on the VCDX. Thanks to these studies I have a much better grasp on the level of documentation needed to be successful. I plan to go back to make these changes and draw these switches. Hopefully we won’t face an error due to an over designed and under documented decision again.