Challenge: vCenter, EVC and Distributed Virtual switches.
Yesterday a colleague asked me to add four blades from our old test environment to our new VMware vSphere 4.1 test environment. Of course this was no problem (yet), I had an hour or two to spare, so I started immediately.
Download the ESXi 4.1 installable ISO, connecting this to the four blades, installing and preconfiguring ESXi and adding them to them VMware HA/DRS test cluster. Adjust the zoning for the SAN and configure the correct VLANs and where done. WRONG!
The two running ESX hosts are equipped with Intel Xenon X5660 CPUs, the four extra ESX hosts have Intel Xenon X5430 CPUs. When I tried to do a vMotion the following error message appeared.
Surprise, the CPUs are not compatible. So I needed to setup EVC in this cluster to mask the advanced features from the Intel Xenon X5660 and bring it to the same feature level as the Intel Xenon 5430’s.
But this creates the first ‘chicken or egg dilemma’ of the day.
What do you need to create an EVC cluster? Right, a vCenter server. But to enable EVC you must shutdown all virtual machines including our vCenter server because it runs virtual of course. So, you need to shutdown vCenter but you need it to enable EVC. Hmmm.
Then I found VMware KB article 1013111, which describes ‘Enabling EVC on a cluster when vCenter is running in a virtual machine’. Just what I needed but unfortunately this got me in more trouble than I was already in.
As the KB article described, I created a new cluster, enabled EVC, moved all virtual machines from one of the ESX hosts and move the ESX host to a the new EVC cluster. Next I shut down the vCenter server and database server, removed them from the inventory, added them to the inventory of my new EVC cluster and booted the servers. Unfortunately this is when the trouble really started. The vCenter server services wouldn’t start and this turned out to be a network issue. The virtual nic wasn’t connected but connecting it resulted in the following error.
Until now I had no idea what I did wrong but this would soon change. I found out that when I created a standard virtual switch and connected the vCenter and database server, the vCenter services would start. Then I decided to call Anne Jan to get a different perspective on this case and hopefully a simple solution. After listening to my problem Anne Jan quickly pinpointed the problem. The one important thing I forgot and (in my defense) which wasn’t mentioned in the VMware KB article, was removing the distributed virtual switch in the first cluster and recreating/connecting it in the new EVC cluster.
This was my second ‘chicken or egg dilemma’ of the day because to create or modify the distributed virtual switches you need a vCenter server. And the one thing that wouldn’t start and communicate with the necessary infrastructure components was my vCenter server. But drastic problems require drastic measures.
How did I solve this? I connected to all six ESX hosts and removed the distributed virtual switch. Then on the host containing the vCenter virtual machine I configured a classic virtual switch with the same configuration (VLANs, port settings, etc) as the distributed virtual switch had. Next I connected the vCenter server, database server and domain controllers to the appropriate port group and booted all systems.
Now that the vCenter server was up and running I created a host profile of this ESX host and applied it to all other ESX hosts, replicating the new classic switch to the other five ESX hosts. Now with the network configuration back online I migrated all virtual machines to the appropriate port groups on the classic virtual switches with the network migration tool which comes with the distributed virtual switch. Now I booted all other virtual machines in the correct order and removed the distributed virtual switch.
Now, 6 to 8 hours later the VMware virtual infrastructure is extended with four extra ESX hosts and is back online. Not bad for a job for which I planned one or two hours but I refused to be the one who got the next day customer demo canceled. Although it is almost my last day at my current employer the drive to solve these issues is still there :-).
So, the bottom line is think twice when using distributed virtual switches and remove the host from the distributed virtual switch before dragging and dropping hosts and assigning them to other clusters.