Challenge: vCenter, EVC and Distributed Virtual switches.
Yesterday a colleague asked me to add four blades from our old test environment to our new VMware vSphere 4.1 test environment. Of course this was no problem (yet), I had an hour or two to spare, so I started immediately.
Download the ESXi 4.1 installable ISO, connecting this to the four blades, installing and preconfiguring ESXi and adding them to them VMware HA/DRS test cluster. Adjust the zoning for the SAN and configure the correct VLANs and where done. WRONG!
The two running ESX hosts are equipped with Intel Xenon X5660 CPUs, the four extra ESX hosts have Intel Xenon X5430 CPUs. When I tried to do a vMotion the following error message appeared.
Surprise, the CPUs are not compatible. So I needed to setup EVC in this cluster to mask the advanced features from the Intel Xenon X5660 and bring it to the same feature level as the Intel Xenon 5430’s.
But this creates the first ‘chicken or egg dilemma’ of the day.
What do you need to create an EVC cluster? Right, a vCenter server. But to enable EVC you must shutdown all virtual machines including our vCenter server because it runs virtual of course. So, you need to shutdown vCenter but you need it to enable EVC. Hmmm.
Then I found VMware KB article 1013111, which describes ‘Enabling EVC on a cluster when vCenter is running in a virtual machine’. Just what I needed but unfortunately this got me in more trouble than I was already in.
As the KB article described, I created a new cluster, enabled EVC, moved all virtual machines from one of the ESX hosts and move the ESX host to a the new EVC cluster. Next I shut down the vCenter server and database server, removed them from the inventory, added them to the inventory of my new EVC cluster and booted the servers. Unfortunately this is when the trouble really started. The vCenter server services wouldn’t start and this turned out to be a network issue. The virtual nic wasn’t connected but connecting it resulted in the following error.
Until now I had no idea what I did wrong but this would soon change. I found out that when I created a standard virtual switch and connected the vCenter and database server, the vCenter services would start. Then I decided to call Anne Jan to get a different perspective on this case and hopefully a simple solution. After listening to my problem Anne Jan quickly pinpointed the problem. The one important thing I forgot and (in my defense) which wasn’t mentioned in the VMware KB article, was removing the distributed virtual switch in the first cluster and recreating/connecting it in the new EVC cluster.
This was my second ‘chicken or egg dilemma’ of the day because to create or modify the distributed virtual switches you need a vCenter server. And the one thing that wouldn’t start and communicate with the necessary infrastructure components was my vCenter server. But drastic problems require drastic measures.
How did I solve this? I connected to all six ESX hosts and removed the distributed virtual switch. Then on the host containing the vCenter virtual machine I configured a classic virtual switch with the same configuration (VLANs, port settings, etc) as the distributed virtual switch had. Next I connected the vCenter server, database server and domain controllers to the appropriate port group and booted all systems.
Now that the vCenter server was up and running I created a host profile of this ESX host and applied it to all other ESX hosts, replicating the new classic switch to the other five ESX hosts. Now with the network configuration back online I migrated all virtual machines to the appropriate port groups on the classic virtual switches with the network migration tool which comes with the distributed virtual switch. Now I booted all other virtual machines in the correct order and removed the distributed virtual switch.
Now, 6 to 8 hours later the VMware virtual infrastructure is extended with four extra ESX hosts and is back online. Not bad for a job for which I planned one or two hours but I refused to be the one who got the next day customer demo canceled. Although it is almost my last day at my current employer the drive to solve these issues is still there :-).
So, the bottom line is think twice when using distributed virtual switches and remove the host from the distributed virtual switch before dragging and dropping hosts and assigning them to other clusters.
Tags In
Related Posts
5 Comments
Leave a Reply Cancel reply
You must be logged in to post a comment.
What you experienced is one of the good reasons to dedicate a standalone physical server for vCenter :)
I disagree :-) You just need to pay attention and not move hosts around before removing from dVswitch.
I have ran into more problems setting up VDs than any other feature before with Vmware and I’ve been building these up since vsphere 2.x. Migration to VDs is awful, never works as intended, esx hosts go out to lunch and never come back unless I go into console and esxcfg-vmknic (-d and -a) to recover. I’m sure it will get better but for now host profiles work for me;) -Superdave
Would it not be easier to Clone the vCenter VM to the new EVC cluster, setup correct network connections, power on the clone and off the original… and wait for it to come back up :-)
I’m thinking of trying this now.. Have vCenter as a VM and Nexus1000v distributed switch…