Update: Storage optimization with Atlantis

Since my article on ‘Storage Optimization with Atlantis’ on April 3rd, I’ve been in contact with Atlantis and got some additional information I would like to share with you.

First of all, in the article it says the following:

Another point of interest is, does Atlantis live up to the promises made regarding the alleged savings of up to 90%? Of course these numbers are sales numbers and it’s up to 90%. But recently I heard about some Atlantis test results from renowned industry parties, which aren’t so positive as the sales number want you to believe. The first tests do show a huge decrease of read I/O but they also show an increase in write I/O. Now, I don’t want to claim that the myth is busted, further investigation and testing is required, but you should definitely try before you buy.

The issues mentioned above where discovered by our colleagues from PQR and also by the Cisco VXI team. But according to Atlantis, they have already corrected this issue.

The issue with the linked clones was caused by mis-alignment of the redo logfile because of the VMware View Composer and linked clone format.  Atlantis released anew version which is able to treat a linked clone specifically for this issue and correct misaligned writes.

According to Atlantis, the PQR team has already retested Atlantis ILIO and found good write offload as promised.

The second item I would like to share, is a concern Anne Jan raised when using Atlantis ILIO.

His question was “What happens when the Atlantis virtual appliance fails for whatever reason? Does the host fail or does the host fail back to the native/non-optimized way of storage interaction?” In other words, is the Atlantis Virtual Appliance a SPOF?

The Atlantis response is as follows:

Your question is very common and a very relevant one about ILIO or its host failing.

  1. The ILIO internally uses a completely journalled and crash consistent file system – so upon a crash it reverts automatically to its last know consistent state and will discard any incomplete I/O.  This means that there is no chance of data corruption or inconsistency if either the ILIO or its host crash
  2. In either failure scenario – the ILIO is restarted manually or automatically (via VMware HA or a 3rd party heartbeat monitoring solution – VMware HA is certified, recommended and supported) on the same host or another host.
  3. Upon reboot – the ILIO does a consistency check and re-exports the NFS/iSCSI datastore on power on.  The entire process lasts between 5-10 minutes.
  4. So an ILIO or host crash causes the desktops to become unavailable for about 5-10 minutes.
  5. An ILIO is deployed on each host  that runs the desktop VMs – we don’t recommend Top of Rack because of the SPOF problem for this reason.

So, if the Atlantis ILIO virtual appliance fails the storage stack does not revert to the native vSphere mode of operation. Instead, after a reboot, manual of by VMware HA, it reverts to the last known consistent state and replays the log files. This is of course good for the sake of data consistency but in case of a crash of the Atlantis ILIO virtual appliance, the desktops are unavailable, so it does add a single point of failure in the storage I/O stack in my opinion.

Nevertheless, I think it’s a product with a lot of potential. Atlantis promised me a personal copy for testing purposes, I hope I receive it soon  so I can inform you on my hands-on experiences with Atlantis ILIO. Check back soon.