Ravello Systems: VMware NSX 6.2 Management Service Stuck in ‘Starting’
Ravello Systems offers ‘Smart Labs’ where you can run a multitude of applications in Amazon or Google Cloud, including nested ESXi Hypervisors. This opens up the possibility of replacing a home lab system, where you have noisy power guzzling servers set up at your home and move your test lab to the cloud. I have been using Ravello for a while and just got around to upgrading my VMware NSX blueprint to NSX 6.2.1, which is where I ran into an issue:
The NSX Management Service did not start on boot; it was stuck in ‘Starting..’ for about 5 minutes until it reverted to ‘Stopped’. Trying to start the NSX Management Services manually by using the ‘Start’ button, the same thing happened; it was ‘Starting..’ for a about 5 minutes and then stopped again. The log was real helpful, as the logs of java driven applications can be:
manager# show log manager reverse 2016-01-03 14:46:00.548 GMT INFO localhost-startStop-1 VsmServletContextListener:75 - NSX Status : STOPPED ... 129 more
After googling for a few minutes, I found that I was not the only one, which confirmed it was not the NSX Manager OVA that I was using. Besides, that same OVA runs perfectly fine on my existing home lab environment.
So, that meant it was fixable! Ravello can publish your virtual machines on Amazon and Google Cloud and due to recommendations from a few vPeers, I always publish them to Amazon. So, I figured I would try on Google Cloud. Alas, same result; NSX Manager 6.2.1 comes up, web interface working fine, but no Management Services.
After playing around with the virtual hardware specs of the NSX Manager, I hit gold. The vNIC of the NSX Manager is standardised on VMXNet3, which is fine under normal circumstances. The trick is to set the vNIC type to e1000. When you deploy the NSX Manager 6.2.1 with a e1000 vNIC, all required services boot up like you’d expect them to:
The release notes of 6.2x don’t really say anything about why it would not work with VMXNet3, no apparent changes to the management interface..Ravello does some networking magic in the background, which might be related to it. I’ve been able to reproduce this a few times on both Amazon and Google Cloud and got the same result each time. For now, I’ll keep using e1000, but I am still curious on why it won’t work on VMXNet3, so I’ll keep digging.