In our last installment, we looked at the effect of VAAI acceleration on some key VMware provisioning operations. In this article, we examine the efficacy of inline deduplication technology applied to full (fat) clones. We will also look at a critical component of any Windows guest OS, the pagefile, and see how its growth is affected by storage optimization.
The results, as you’ll see, are compelling. We are able to rapidly create full clones, which have all of the advantage of “stateful” operation. Moreover, the resources needed to provision these clones when inline deduplication is enabled, are a tiny fraction of those needed by legacy storage systems. Because deduplication in the GreenBytes system is inline, we find that the clones start small and stay small. No landing zones are needed to “re-hydrate data” for recomposing of the OS images.
A second compelling outcome was the reduction in the footprint of the Windows pagefile. The growth of the Windows pagefile has a significant impact on the number of VM/VDI images that can be provisioned on a single storage appliance.
In the case of the GreenBytes IO Offload Engine (IOOE), we see a dramatic reduction in the footprint of the pagefile under heavy load. This allows us to dramatically overcommit the Guest OS from the perspective of memory allocated to each VM (less), while both increasing the number of VMs per host and avoiding the dreaded ESX overcommitted memory condition.
In short, the IOOE allows the best possible use case for VDI: stateful “fat” clones running in the smallest possible memory footprint with great performance.
Test 1: Cloning Performance
To test cloning performance, we created a 40GB VDI master image. Since the IOOE presents two virtual storage pools (for active fail-over), we created 2 VMFS-5 data-stores per-pool (for a total of 4 data-stores). We further enhanced cloning operations by providing a script that balanced the creation of clones on each virtual pool. Now, with a small two-server cluster, we initiated the creation of 250 clones and measured the resulting time and storage resources required:
VDIs |
VDI Image Size |
Total Virtual Storage Required |
Actual Storage Required |
Compression Ratio |
Deduplication Ratio |
Total Data Reduction |
250 |
40GB |
10TB |
53GB |
1.3 |
143 to 1 |
186 to 1 |
This astounding reduction: 186 to 1, is the result of highly efficient inline deduplication.
The speed of cloning was also impressive:
Number of Clones |
Total Time |
Time/Clone |
MB/Clone |
Total Cloned |
Effective Data Rate |
250 |
7140 Sec |
28.56 Sec |
40,960 |
10240000 |
1.43 GB/S |
Here we see only two hosts sustaining a clone rate of 1.43 GB/S with inline deduplication and compression. This is about 10X the rate of most legacy systems.
Test 1: Conclusions
A 186 to 1 reduction in the amount of storage needed to provision full stateful VDIs is simply unprecedented. Without the limitations of linked-clones, this is clearly optimal. This translates into a maximum of 500GB of flash needed to support 5,000 VDIs. Of course, that is only part of the equation. We also have to worry about guest OS pagefiles that can often soak up even more resources than the statically provisioned.
Test 2: Pagefile Reduction
In part 2 of our tests, we focused on the effects of our optimizations on Windows pagefile size. To do this, we created a set of 61 images each with a 2GB pagefile segregated to a separate virtual pool (to facilitate measurements).
Here is the screen capture from the VMware side showing the space required for the pagefiles:
You can clearly see above the 128GB required to accommodate the pagefiles. On the IOOE side, this requires about 1GB of storage (metadata only) as we optimize away sparse files. Now things get a bit more interesting. When we created each clone, we set out to force a low memory condition:
We dramatically reduced the amount of memory allocated below recommended limits (only 200MB). We then booted each clone running a script on each VDI, which allocates memory aggressively, forcing the OS to access its page file aggressively. With only 200MB of RAM, we force each VDI to swap out 256MB of memory (128% overcommitted) continuously. The result is that approximately 16GB of swap activity ensued.
What is interesting is that we reduced the amount of flash storage needed to handle this swap by a factor of 4 to 1.
This is a 74.4% reduction in the amount of active swap space needed using the IOOE vs. legacy systems. In addition, the static space (as mentioned earlier) is virtually free.
Test 2: Conclusion
These tests suggest that with respect to OS paging, the flash provisioned by the standard IOOE appliance (4.8TB) would be enough to provision 8,000 VDI instances, all actively paging to 2GB swap partitions. Of course, this is only part of the story. We also need to support the thundering herd of IOPS required to provide service to these VDI instances. Our next installment will look at that as we benchmark an IOOE and measure its ability to deliver 8K IOs (the VMFS 5 native IO size).
Putting it All Together:
Solid State technology is the key to low latency performance in shared storage systems. VDI has been one of the killer applications identified for flash. The problem has been the amount of flash needed to support VDI at scale. As we can see, GreenBytes technology is, without qualification, a break-through. If we add up the requirements for 5,000 VDIs without optimization, we see that 200TB is needed to store FAT Clones (the kind you want). Linked Clones help reduce this to about 15TB but you are stuck with a one-size-fits-all solution that is only applicable to a subset of users. OS pagefiles are the next biggest contributor to cost. For the same 5,000 VDI case, ESX requires 10TB of Flash vs. just 2.5 TB (with GreenBytes). The following table summarizes this nicely:
Total Flash Storage Required (VDI Images plus 2GB Page File)
|
||
Full Clones |
Linked Clones |
GreenBytes (Full Clones) |
210TB |
25TB |
3TB |
