As Chris Murphy explained in his video blog post, EMC IT began last year to implement a virtual desktop infrastructure based on VMware View. The VDI concept is pretty straightforward, and sounds compelling: reduce desktop management complexity, more cost-effectively update aging desktops (and their operating systems), and give users greater platform choice—and “anywhere, anytime” universal access.
Can VDI really deliver its user experience promise? How much it would really benefit our company in cost savings and in increased flexibility? EMC IT came up with answers to those questions—and got a “green light” for deploying a production VDI environment during the second half of this year.
First Step: Learn The Technology
As I wrote previously, VDI “rules of thumb” generally estimate cost savings based on several assumptions and fairly ideal conditions. A number of vendors and consultants offer online “ROI calculators,” some with fairly sophisticated models. However, calculating the actual cost/benefit ratio we‘d likely see from a VDI solution in a real environment is no small undertaking.
EMC IT chose to go about this using a phased approach. Their first goal was to learn about VDI and how to make best use of it. Then they could incrementally explore benefits—and risks—at increasingly large scale.
The VDI pilot had a small number of users. They were mostly contract workers and other outside-partner employees needing access to certain EMC internal systems. Those users had previously been using virtual desktops through Citrix Desktop Broker, and their feedback showed VDI bringing an overall improvement in performance and stability.
Encouraged, IT moved on to a proof-of-concept (POC) in late summer. This time, additional Massachusetts-based VMware View 3.1 servers were provisioned to house a 200-user mix that included EMC employees at home, training facility desktops, and remote eLearning.
Feedback from this POC was still positive, but users were comparing their VDI experience with physical desktops, not Citrix terminal-server sessions. Just over half the POC users would use VDI for a secondary desktop, but only a small minority of those said they’d pick a virtual desktop to replace their current machine.
Not great news. Why the reluctance? Slower performance when printing locally and accessing local drives—especially for users in Europe and Asia. EMC IT did some testing in their lab, and found a significant performance bottleneck in Microsoft Windows’ Remote Desktop Protocol (RDP) on links with long network delays—or latency. Such transmission delays are unavoidable when accessing virtual desktops halfway around the globe. Something about that darned speed of light thing.
The latency “pain threshold” ended up being around 100 milliseconds. For local users, where network latencies are typically much shorter, virtual desktop performance was similar to a physical PC. For remote users with latencies exceeding 100ms, remote desktop performance was consistently slower than a local PC.
One application, however, actually works better running on a virtual desktop halfway around the world than locally: Microsoft Outlook. Another surprise was the number of people using media-intense collaboration tools such as Office Communicator, Live Meeting and WebEx to share applications and entire desktops seems to have taken our IT folks by surprise. IT ended up adding applications to its VDI “base image” because of how many people used them regularly. It also cause IT to adjust how much weight it gave to concerns like network latency.
Overall, VMware View showed promise, and would work well for locally based task workers. But that’s a small minority of EMC’s employee population. So IT concluded that version 3.1 would not be suitable for enterprise-wide deployment. Luckily enough, View 4.0 was becoming available by year-end, bringing “PCoverIP,” a protocol that promised to overcome RDP’s latency challenges.
Finding VDI’s Limits
In mid-September, towards the end of the VDI Pilot, IT began testing a beta release of View 4.0 in their lab. IT’s VDI lab contains network test gear that can add latency, drop network packets, and simulate a wide variety of network conditions and pathology. The test team was particularly interested in the (hopefully positive) impact of PCoverIP at higher connection latencies. The results, including “stopwatch tests” that measure wall-clock performance of virtual-desktop tasks, looked promising.
IT launched a View 4.0 based POC in early November, and it’s running as I write this with 500 users (yours truly being one of them). To improve application performance for knowledge workers, the bulk of EMC’s employees, memory allocated to each virtual machine was doubled to 2 GB.
But that didn’t need to mean cutting the number of desktops per server in half. You can see in this slide from an EMC IT deck that Cisco UCS, a key Vblock component, by the way, enabled IT to double the number of virtual desktops per CPU core. How? Custom memory ASICs. You gotta love machines purpose built for virtual infrastructure.
In addition to testing more use cases and larger scale, IT team members are gathering data to help with a tough decision: whether to use a centralized or regional strategy for a worldwide VDI deployment.
A fully centralized approach has obvious advantages from a cost and complexity standpoint. But a regional deployment has performance advantages because it significantly lowers network latency for users in regions far away from EMC’s U.S. headquarters. So a small server infrastructure was added in Cork, Ireland.
Packet loss, on the other hand, quickly renders a PCoverIP VDI session unusable. And packet loss is a greater threat over greater distances.
Bottom line: the jury’s still out on centralized vs. regional deployment.
Do any of you have experience you can share on this debate?
The good news is that deploying virtual machines and updating base system software was indeed, as promised, easier. Templates, linked clones—all cool stuff.
But some things got harder. VDI changes everything. Unspoken assumptions in desktop management can suddenly become invalid. And if we’re not paying attention, VDI’s biggest advantage—consolidation—can become our biggest source of pain.
A really unpleasant example of this started happening every day around 11:30am. Performance would suddenly tank, with seemingly random responsiveness that was so bad it could take minutes for typed characters to be displayed on screen. This typically lasted till 1:30pm. (I have to admit I usually reverted to my local desktop. It was just too hard to get anything done.) Just what the heck was going on?
One big contributor turned out to be anti-virus software. Centrally managed by EMC IT, our company’s desktops have been programmed to automatically update their virus definitions, and then scan local files every day during that two-hour period around lunchtime. Scattered over hundreds of machines across a network, the performance impact is barely noticeable. It’s a classic example of the power of distributed work.
When a couple hundred virtual machines suddenly scanning all of their “local files” on a single ESX server, the impact is dramatic. Remember, VDI’s big advantage is consolidation. Furthermore, it’s about much more than merely pooling previously distributed resources in a central place. Virtualization’s true power—in desktops, servers, networking and storage—is in oversubscribing, also known as thin provisioning. In other words, providing more virtual resources than physically exist.
For average desktop workloads, with their largely random and sporadic resource demands, oversubscribed resources translate into much greater efficiencies with little to no loss of performance. But when a large majority of a server’s VMs simultaneously start virus scanning, resources are quickly starved. Adding insult to injury, most of each VM’s files are actually physically located in a single place on disk (using “linked clone” technology), meaning hundreds of virus scanners are competing for access to the same files.
But anti-virus scanning isn’t the whole story. After dispersing scans over a much wider timeframe—and even temporarily eliminating them—the daily VDI lunchtime crunch continues. Apparently, a lot of automated activities were created, scheduled, and forgotten years ago. Fearless IT archeologists have managed to unearth a few, but this particular mystery remains.
This has sparked a debate in EMC IT. When we start deploying Windows 7, should we abandon the modifications and settings built up over the years and start fresh? Or should we pursue this problem and find its cause, no matter how much time and effort is required?
Whether or not you’re planning to use VDI, which route are you taking?
For this POC, IT started using our internal social-network community as well as formal user surveys to gather feedback and suggestions. That’s important, because survey data only answers questions you already know to ask.
For example, EMC IT was considering a bring-your-own-PC program pilot as part of a later VDI deployment stage. But IT wanted to target early adopters in the next stage.
Guess what? We have a few thousand users in EMC that have purchased their own Macs. They’re using VMware Fusion to run corporate virtual desktop images, created by EMC IT, to use business applications available only on Windows. They’re already BYOPC users, the majority of them are early adopters—and largely self-supporting. What better audience for testing a BYOPC program for VDI? Or for testing a completely virtual Windows 7 desktop deployment? Sounds to me like a lot of bang for the buck. And that’s exactly how EMC IT sees it.
In my next post, we’ll finally look at TCO/ROI numbers that helped gain approval for next phase of EMC IT’s VDI project: a production rollout for 5,000 of EMC’s 40,000 desktop users.
As always, I welcome your thoughtful comments.