VRCProject Virtual Reality Check finally posted a new document about previous results and possible clock drift when using the “Login Virtual Session Indexer (VSI)”.  Previous test setups and results didn’t take into account how different hypervisors handle passing time.

In my opinion this is a serious setback to Project VRC which is considered an institute in the virtualization world. People will start questioning the results when no new tests will be performed.

Below is a description from the Project VRC website explaining the new whitepaper they published on September 14th 2009. This is a must read for people that already did some testing as well as new tests. In short: ‘Because of Windows clock behavior in virtual machines the results were affected and some hypervisors may come out better than they really are.

This whitepaper is a review and reflection on previous Project VRC publications, the benchmark: “Login Virtual Session Indexer (VSI)” and Windows clock behavior within virtual machines.  This discussion is fueled by the fact that results from the individual Project VRC whitepapers are set side-by-side to compare hypervisors. Project VRC has been in discussion with both vendors and community, and performed additional research in this context. Before Project VRC can publish new results, it is important to address any questions, review the impact of this discussion and improve VSI where possible.

You can download it at www.projectvrc.nl

The major conclusions in this Whitepaper are:

  • Clocks drifts within VM’s are possible on every hypervisor. This has two implications:
    • The clock is responsible for the response time measurements: the reported results can be less accurate.
    • The clock is used with every timed event such as sleep/wait/delay/pause/etc.: this can have a impact on weight of the workload because the activities are stretched over time.
  • The use of the /USEPMTIMER in the boot.ini is required for Windows 2003 guests running on XenServer. This will fix the consistent 10% clock drift/accuracy issues seen on XenServer.
  • The measured drift/accuracy issues are much lower (a consistent 10-20ms per timed event) with VMware vSphere 4.0 and Microsoft Hyper-V 2.0.
  • Login VSI 2.0 introduces a new indexing method called VSImax. The new method is more accurate because any form of scripted or build-in delay/sleep is completely excluded from the measured response time.
  • Because of the mentioned improvements in the scripting method used in Login VSI 2.0, the reliability and robustness has been greatly enhanced: the VSI workload is now stable under extreme conditions.
  • VSI is vendor and product independent. For this reason AutoIT remains the preferred and most practical method to generate the VSI workload.
  • With VSI 2.0 it is now possible to measure the response time on an external Microsoft SQL server, and calibrate sleeps within the AutoIT script during runtime.
  • External clocking and calibration does affect results by a relatively small margin (up to 6% was measured).
  • The switch from Optimal Performance Index to VSImax has potentially a much bigger influence on the results. As a result, it is safe to assume that the ratio between VMware ESX 3.5 and Citrix XenServer 5.0 reported in VRC1.0 would be comparable to the VSI 2.0 results we found with vSphere 4.0 and XenServer 5.5 if VSImax was used.

As a result of this they made the following comment:

It is worthwhile to use calibration when hypervisors are compared. However, when evaluating different configurations on a single hypervisor, the results are always relative, and calibration would not change conclusions. Consequently, these findings do not change the recommendations and best-practices discovered and published in VRC 1.0 and the 1.1 update.
Project VRC will from now on only publish the specific VSImax outcome when using an external clock. Any “uncalibrated” results will only be used to highlight the differences (in %) within the context of a single hypervisor.

As true as this statement may be many of our customers do use the different whitepapers to compare hypervisors although Project VRC is all about Microsoft client OS’s and VDI vs Terminal server scaling on different hypervisors. So if they don’t correct the results of the previous tests, in my opinion, they have become worthless and that would be a shame because Project VRC is a great initiative and I know guys like Ruben Spruijt and Jeroen van de Kamp spend a lot of time trying to give us something to work with.

So I really hope they find the time to correct the results from previous test because customers and vendors will always try to use their great test to do a hypervisor comparison because comparing hypervisors is such a damn difficult task to perform (themselves).