Select Page

Horizon and Workspace ONE Architecture for 250k Users Part 1

Disclaimer: This article is based on my own thoughts and experience and may not reflect a real-world design for a Horizon/Workspace ONE architecture of this size. The blog series focuses only on the Horizon or Workspace ONE infrastructure part and does not consider other criteria like CPU/RAM usage, IOPS, amount of applications, use cases and so on. Please contact your partner or VMware’s Professional Services Organization (PSO) for a consulting engagement.

To my knowledge there is no Horizon implementation of this size at the moment of writing. This topic, the architecture and the necessary amount of VMs in the data center, was always important to me since I moved from Citrix Consulting to a VMware pre-sales role. I always asked myself how VMware Horizon scales when there are more than only 10’000 users.

250’000 users are the current maximum for VMware Horizon 7.8 and the goal is to figure out how many Horizon infrastructure servers like Connection Servers, App Volumes Managers (AVM), vCenter servers and Unified Access Gateway (UAG) appliances are needed and how many pods should be configured and federated with the Cloud Pod Architecture (CPA) feature.

I will create my own architecture, meaning that I use the sizing and recommendation guides and design a Horizon 7 environment based on my current knowledge, experience and assumption.

After that I’ll feed the Digital Workspace Designer tool with the necessary information and let this tool create an architecture, which I then compare with my design.

Scenario

This is the scenario I defined and will use for the sizing:  

Users: 250’000
Data Centers: 1 (to keep it simple)
Internal Users: 248’000
Remote Users: 2’000
Concurrency Internal Users: 80% (198’400 users)
Concurrency Remote Users: 50% (1’000 users)

Horizon Sizing Limits & Recommendations

This article is based on the current release of VMware Horizon 7 with the following sizing limits and recommendations:

Horizon version: 7.8
Max. number of active sessions in a Cloud Pod Architecture pod federation: 250’000
Active connections per pod: 10’000 VMs max for VDI (8’000 tested for instant clones)
Max. number of Connection Servers per pod: 7
Active sessions per Connection Server: 2’000
Max. number of VMs per vCenter: 10’000
Max. connections per UAG: 2’000 

The Digital Workspace Designer lists the following Horizon Maximums:

 

Horizon Maximums Digital Workspace Designer

Please read my short article if you are not familiar with the Horizon Block and Pod Architecture.

Note: The App Volumes sizing limits and recommendations have been updated recently and don’t follow this rule of thumb anymore that an App Volumes Manager only can handle 1’000 sessions. The new recommendations are based on “concurrent logins per second” login rate:

New App Volumes Limits Recommendations

 

Architecture Comparison VDI

Please find below my decisions and the one made by the Digital Workspace Designer (DWD) tool:

Horizon ItemMy DecisionDWD ToolNotes
Number of Users (concurrent)199'400199'400
Number of Pods required2020
Number of Desktop Blocks (one per vCenter)100100
Number of Management Blocks (one per pod)2020
Connection Servers required100100
App Volumes Manager Servers802024+1 AVMs for every 2,500 users
vRealize Operations for Horizonn/a22I have no experience with vROps sizing
Unified Access Gateway required22
vCenter servers (to manage clusters)20100Since Horizon 7.7 there is support for spanning vCenters across multiple pods (bound to the limits of vCenter)

Architecture Comparison RDSH

Please find below my decisions* and the one made by the Digital Workspace Designer (DWD) tool:

Horizon ItemMy DecisionDWD ToolNotes
Number of Users (concurrent)199'400199'400
Number of Pods required2020
Number of Desktop Blocks (one per vCenter)20401 block per pod since we are limited by 10k sessions per pod, but only have 333 RDSH per pod
Number of Management Blocks (one per pod)2020
Connection Servers required100100
App Volumes Manager Servers142024+1 AVMs for every 2,500 users/logins (in this case RDSH VMs (6'647 RDSH totally))
vRealize Operations for Horizonn/a22I have no experience with vROps sizing
Unified Access Gateway required22
vCenter servers (to manage resource clusters)440Since Horizon 7.7 there is support for spanning vCenters across multiple pods (bound to the limits of vCenter)

*Max. 30 users per RDSH

Conclusion

VDI

You can see in the table for VDI that I have different numbers for “App Volumes Manager Servers” and “vCenter servers (to manage clusters)”. For the amount of AVM servers I have used the new recommendations which you already saw above. Before Horizon 7.7 the block and pod architecture consisted of one vCenter server per block:

Horizon Pod vCenter tradtitional

That’s why, I assume, the DWD recommends 100 vCenter servers for the resource cluster. In my case I would only use 20 vCenter servers (yes, it increases the failure domain), because Horizon 7.7 and above allows to span one vCenter across multiple pods while respecting the limit of 10’000 VMs per vCenter. So, my assumption is here, even the image below is not showing it, that it should be possible and supported to use one vCenter server per pod:

Horizon Pod Single vCenter

RDSH

If you consult the reference architecture and the recommendation for VMware Horizon you could think that one important information is missing:

The details for a correct sizing and the required architecture for RDSH!

We know that each Horizon pod could handle 10’000 sessions which are 10’000 VDI desktops (VMs) if you use VDI. But for RDSH we need less VMs – in this case only 6’647.

So, the number of pods is not changing because of the limitation “sessions per pod”. But there is no official limitation when it comes to resource blocks per pod and having one connection server for every 2’000 VMs or sessions for VDI, to minimize the impact of a resource block failure. This is not needed here I think. Otherwise you would bloat up the needed Horizon infrastructure servers and this increases operational and maintenance efforts, which obviously also increases the costs.

But, where are the 40 resource blocks of the DWD tool coming from? Is it because the recommendation is to have at least two blocks per pod to minimize the impact of a resource block failure? If yes, then it would make sense, because in my calculation you would have 9’971 RDSH users sessions per pod/block and with the DWD calculation only 4’986 (half) per resource block.

*Update 28/07/2019*
I have been informed by Graeme Gordon from technical marketing that the 40 resources blocks and vCenters are coming from here:

App Volumes vCenters per Pod

I didn’t see that because I expect that we can go higher if it’s a RDSH-only implementation.

App Volumes and RDSH

The biggest difference when we compare the needed architecture for VDI and RDSH is the number of recommended App Volumes Manager servers. Because “concurrent logins at a one per second login rate” for the AVM sizing was not clear to me I asked our technical marketing for clarification and received the following answer:

With RDSH we assign AppStacks to the computer objects rather than to the user. This means the AppStack attachment and filter drive virtualization process happends when the VM is booted. There is still a bit of activity when a user authenticates to the RDS host (assignment validation), but it’s considerably less than the attachment process for a typical VDI user assignment.

Because of this difference, the 1/second/AVM doesn’t really apply for RDSH only implementations.

With this background I’m doing the math with 6’647 logins and neglect the assignment validation activity and this brings me to a number of 4 AVMs only to serve the 6’647 RDS hosts.

Disclaimer

Please be reminded again that these are only calculations to get an idea how many servers/VMs of each Horizon component are needed for a 250k user (~200k CCU) installation. I didn’t consider any disaster recovery requirements and this means that the calculation I have made recommend the least amount of servers required for a VDI- or RDSH-based Horizon implementation.

vSAN Basics for a Virtual Desktop Infrastructure with VMware Horizon

As an EUC architect you need fundamental knowledge about VMware’s SDDC stack and this time I would like to share some more basics about VMware vSAN for VMware Horizon.

In part 5 of my VCAP7-DTM Design exam series I already posted some YouTube videos about vSAN in case you prefer videos instead of reading. To further proof my vSAN knowledge I decided to take the vSAN Specialist exam which focuses on the version 6.6.

To extend my vSAN skills and to prep myself for this certification I have bought the VMware vSAN 6.7 U1 Deep Dive book which is available on Amazon.

vSAN 6.7 U1 Deep Dive

vSAN Basics – Facts and Requirements

Out in the field not every EUC guy has enough sic knowledge about vSAN and I want to provide some facts about this technology here. This is no article about all the background information and detailed stuff you can do with vSAN, but it should help you to get a basic understanding. If you need more details about vSAN I highly recommend the vSAN 6.7 U1 Deep Dive book and the content available on storagehub.vmware.com.

  • The vSAN cluster requires at least one flash device and capacity device (magnetic or flash)
  • A minimum of three hosts is required except you go for a two-node configuration (requires a witness appliance)
  • Each host participating in the vSAN cluster requires a vSAN enabled VMkernel port
  • Hybrid configurations require a minimum of one 1GbE NIC, 10GbE is recommended by VMware
  • All-Flash configurations require a minimum of one 10GbE NIC
  • vSAN can use RAID-1 (mirroring) and RAID5-/6 (erasure coding) for the VM storage policies
  • RAID-1 is used for performance reasons, erasure coding is used for capacity reasons
  • Disk groups require one flash device for the cache tier and one or more flash/magnetic device for the capacity tier
  • There can be only one cache device per disk group
  • Hybrid configuration – The SSD cache is used for read and write (70/30)
  • All-Flash configuration – The SSD cache is used 100% as a write cache
  • Since version 6.6 there is no multicast requirement anymore
  • vSAN supports IPv4 and IPv6
  • vSphere HA needs to be disabled before vSAN can be enabled and configured
  • The raw capacity of a vSAN datastore is calculated by the number of capacity devices multiplied by the number of ESXi hosts (e.g. 5 x 2TB x 6 hosts = 60 TB raw)
  • Deduplication and compression are only available in all-flash configurations
  • vSAN stores VM data in objects (VM home, swap, VMDK, snapshots)
  • The witness does not store any VM specific data, only metadata
  • vSAN provides data at rest encryption which is a cluster-wide feature
  • vSAN integrates with CBRC (host memory read cache) which is mostly used for VMware Horizon
  • By default, the default VM storage policy is assigned to a VM
  • Each stretched cluster must have its own witness host (no additional vSAN license needed)
  • Fault domains are mostly described with the term “rack awareness”

vSAN for VMware Horizon

The following information can be found in the VMware Docs for Horizon:

When you use vSAN, Horizon 7 defines virtual machine storage requirements, such as capacity, performance, and availability, in the form of default storage policy profiles, which you can modify. Storage is provisioned and automatically configured according to the assigned policies. The default policies that are created during desktop pool creation depend on the type of pool you create.

This means that Horizon will create storage policies when a desktop pool get created. To get more information I will provision a floating Windows 10 instant clone desktop pool. Before I’m doing that, let’s have a look first at the policies which will appear in vCenter depending on the pool type:

Since I’m going to create a floating instant clone desktop pool I assume that I should see some the storage policies marked in yellow. 

Instant Clones

First of all we need to take a quick look again at instant clones. I only cover instant clones since it’s the recommended provisioning method by VMware. As we can learn from this VMware blog post, you can maissvely reduce the time for a desktop to be provisioned (compared to View Composer Linked Clones).

VMware Instant Clones

The big advantage of the instant clone technology (vmFork) is the in-memory cloning technique of a running parent VM.

The following table summarizes the types of VMs used or created during the instant-cloning process:

Instant Cloning VMs
Source: VMWARE HORIZON 7 INSTANT-CLONE DESKTOPS AND RDSH SERVERS 

Horizon Default Storage Policies

To add a desktop pool I have created my master image first and took a snapshot of it. In my case the VM is called “dummyVM_blog” and has the “vSAN Default Storage Policy” assigned.

How does it go from here when I create the floating Windows 10 instant clone desktop pool?

Instant Clone Technology

The second step in the process is where the instant-clone engine uses the master VM snapshot to create one template VM. This template VM is linked to the master VM. My template VM automatically got the following storage policy assigned:

The third step is where the replica VM gets created with the usage of the template VM. The replica VM is a thinprovisioned full clone of the internal template VM. The replica VM shares a read disk with the instantclone VMs after they are created. I only have the vSAN datastore available and one replica VM is created per datastore. The replica VM automatically got the following storage policy assigned:

The fourth step involves the snapshot of the replica VM which is used to create one running parent VM per ESXi host per datastore. The parent VM automatically got the following storage policies assigned:

After, the running parent VM is used to create the instant clone, but the instant clone will be linked to the replica VM and not the running parent VM. This means a parent VM can be deleted without affecting the instant clone. The instant clone automatically got the following storage policies assigned:

And the complete stack of VMs with the two-node vSAN cluster in my home lab, without any further datastores, looks like this:

vCenter Resource Pool 

Now we know the workflow from a master VM to the instant clone and which default storage policies got created and assigned by VMware Horizon. We only know from the VMware Docs that FTT=1 and one stripe per object is configured and that there isn’t any difference except for the name. I checked all storage policies in the GUI again and indeed they are all exactly the same. Note this:

Once these policies are created for the virtual machines, they will never be changed by Horizon 7

Even I didn’t use linked clones with a persistent disk the storage policy PERSISTENT_DISK_<guid> gets created. With instant clones there is no option for a persistent disk yet (you have to use App Volumes with writable volumes), but I think that this will come in the future for instant clones and then we also don’t need View Composer anymore. 🙂

App Volumes Caveat

Don’t forget this caveat for App Volumes when using a vSAN stretched cluster.

New Supermicro Home Lab

For a few years I ve been using three Intel NUC Skull Canyon (NUC6i7KYK) mini PCs for my home lab. Each NUC is equipped with the following:

  • 6th Gen Intel i7-6770HQ processor with Intel Iris Pro graphics
  • 2x 16GB Kingston Value RAM DDR4-2133
  • 2x 500GB Samsung 960 EVO NVMe M.2
  • 1x Transcend JetFlash 710S USB boot device

These small computers were nice in terms of space, but are limited to 32GB RAM, have only 1 network interface and no separate management interface.

This was enough and acceptable when I worked with XenServer, used local storage and just had to validate XenDesktop/XenApp configurations and designs during my time as Citrix consultant.

When I started to replace XenServer with ESXi and created a 3-node vSAN cluster for my first Horizon 7 environment, all was running fine at the beginning. But after while I had strange issues doing vMotions, OS installations, VCSA or ESXi upgrades.

So, I thought it’s time build a “real” home lab and was looking for ideas. After doing some research and talking to my colleague Erik Bussink, it was clear for me that I have to build my computing nodes based on a Supermicro mainboard. As you may know, the Skull Canyons are not that cheap and therefore I will continue using them for my domain controller VMs, vSAN witness, vCenter Server appliance etc.

Yes, my new home lab is going to to be a 2-node vSAN cluster.

Motherboard

I found two Supermicro X11SPM-TF motherboards for a reduced price, because people ordered and never used them. This was my chance and a “sign” that I have to buy my stuff for the new home lab NOW! Let’s pretend it’s my Christmas gift. 😀

The key features for me?

Chassis

I went for the Fractal Design Node 804 because it offers me space for the hardware and cooling. And I like the square form factor which allows me to stack them.

CPU

I need some number of cores in my system to run tests and have enough performance in general. I will mainly run Workspace ONE and Horizon stuff (multi-site architectures) in my lab, but this will change in the future. So I have chosen the 8-core Intel Xeon Silver 4110 Processor with 2.10 GHz.

Memory

RAM was always a limiting factor with my NUCs. I will reuse two of them and start with two 32GB 2666 MHz Kingston Server Premier modules for each ESXi host (total 64GB per host). If memory prices are reducing and I would need more capacity, I easily can expand my system.

Boot Device

Samsung 860 EVO Basic 250GB which is way too much for ESXi, but the price is low and I could use the disk for something else (e.g. for a new PC) if needed.

Caching Device for vSAN

I will remove one Samsung 960 EVO 500GB M.2 of each NUC and use them for the vSAN caching tier. Both NUCs will have still one 960 EVO 500 left to be used as local storage.

Capacity Device for vSAN

Samsung 860 Evo Basic 1TB.

Network

Currently, my home network only consists of Ubiquiti network devices with 1GbE interfaces.

So I ordered the Ubiquiti 10G 16-port switch which comes with four 1/10 Gigabit RJ45 ports – no SFPs needed for now. Maybe in the future 😀

This is the home lab configuration I ordered and all parts should arrive until end of November 2018.

What do you think about this setup?

Your feedback is very welcome!