Azure VMware Solution

Mar 3, 2020 | AWS, Azure, ESXi, Networking, VCPP, VMC, vSAN, vSphere

Update May 2020:

On May 4th Microsoft announced the preview (and the “next evolution”) of Azure VMware Solution which is now a first-party offering service designed, built and supported my Microsoft and endorsed by VMware. This is an entirely new service entirely delivered and supported by Microsoft and does not replace the current AVS solution/service by CloudSimple at this time. This is truly just a Microsoft technology offering and has also nothing to do with a Virtustream branded Azure VMware Solution offering. Short: A way cleaner offering and service with a contract only between Microsoft and VMware.

— Original Text below —

Since Dell Technologies World 2019 it’s clear: VMware and Microsoft are not frenemies anymore!

Dell Technologies and Microsoft announced an expanded partnership which should help customers and provide them more choice and flexibility for their future digital workspace projects or cloud integrations.

One result and announcement of this new partnership is the still pretty new offering called “Azure VMware Solution” (AVS). Other people and websites may also call it “Azure VMware Solution by Virtustream” or “Azure VMware Solution by CloudSimple”.

AVS is a Microsoft first-party offering. Meaning, that it’s sold and supported by Microsoft, NOT VMware. This is one very important difference if you compare it with VMC on AWS. The operation, development and delivery are done by a VMware Cloud Verified and Metal-as-a-Service VCPP (VMware Cloud Provider Program) partner; CloudSimple or Virtustream (subsidiary of Dell Technologies). AVS is fully supported and verified by VMware.

VMware Metal-as-a-Service Authorized partners Virtustream and CloudSimple run the latest VMware software-defined data center technology, ensuring customers enjoy the same benefits of a consistent infrastructure and consistent operations in the cloud as they achieve in their own physical data center, while allowing customers to also access the capabilities of Microsoft Azure.

So, why would someone like Microsoft run VMware’s Cloud Foundation (VCF) stack on Azure? The answer is quite simple. VMware has over 500’000 customers and an estimated number of 70mio VMs which are mostly running on-premises. Microsoft’s doesn’t care if virtual machines (VMs) are running on vSphere, they care about Azure and the consumption in the end. AVS is just another form of Azure, Microsoft says. I would say it’s very unlikely that a customer moves on to Azure native once they are onboarded via Azure VMware Solution.

Microsoft would like to see some of the 70mio running on their platform, no matter if it’s VCF on top of their Azure servers. Customers should get the option to move to the Azure cloud, using Azure native services (e.g. Azure NetApp Files, Azure databases etc.), but give them the choice and flexibility to use their existing technology stack, ecosystem and tools (e.g. automation or operation) they are familiar with – the whole or some part of the VCF coupled with products from the vRealize Suite. Plus, other VMware 3rd party integrations they might have for data protection or backup. This is one unique specialty – Microsoft says – that there is no restricted functionality as you may experience in other VMware clouds.

Azure VMware Solutions Components

From VMware’s perspective most of our customers are already Microsoft customers as well. In addition to that VMware’s vision is to provide the freedom of choice and flexibility, same like Microsoft, but it one small difference: to be cloud and infrastructure agnostic. This vision says that VMware doesn’t care if you run your workloads on-prem, on AWS, Azure or GCP (or even at a VCPP partner’s cloud) as long it’s running on the VCF stack. Cloud is not a choice or destination anymore, it has become an operation model.

And to keep it an operation model without having a new silo and the vendor lock-in, it makes totally sense to use VMware’s VCF on top of AWS, Azure, Google Cloud, Oracle, Alibaba Cloud or any other VCPP partners. This ensures that customers have the choice and flexibility they are looking for, coupled with the new and maybe still surprising “new” or “special” public cloud. If your vision is also about workload mobility on any cloud, then VMware is the right choice and partner!

Use Cases

What are the reasons to move to Azure and use Azure VMware Solution?

If you don’t want to scale up or scale out your own infrastructure and would like to get additional capacity almost instantly, then speed is definitely one reason. Microsoft can spin up a new AVS SDDC under 60min, which is impressive. How is this possible? With automation! This proves that VMware Cloud Foundation is the new data center operating system of the future and that automation is a key design requirement. If you would like to experience nearly the same speed and work with the same principles as public cloud provider do, then VCF is the way to go.

The rest of the use cases or reasons are in general the same if we talk about cloud. If it’s not only speed, then agility, (burstable) capacity, expansion in a new geography, DRaaS or for app modernization reasons using cloud native services.

Microsoft Licenses

What I have learned from this MS Ignite recording, is, that you can bring your existing MS licenses to AVS and that you don’t have to buy them AGAIN. In any other cloud this is not the case.

This information can be found here as well:

Beginning October 1, 2019, on-premises licenses purchased without Software Assurance and mobility rights cannot be deployed with dedicated hosted cloud services offered by the following public cloud providers: Microsoft, Alibaba, Amazon (including VMware Cloud on AWS), and Google. They will be referred to as “Listed Providers”.

Regions

If you check the Azure documentation, you’ll see that AVS is only available in US East and West Azure regions, but should be available in Western Europe “in the near future”. In the YouTube video above Microsoft was showing this slide which shows their global rollout strategy and the planned availability for Q2 2020:

Azure VMware Solutions Regions 2020

According to the Azure regions website Azure VMware Solution is available at the following locations and countries in Europe:

Azure VMware Solutions by Azure Region So, North Europe (UK) is expected for H2 2020 and AVS is already available in the West Europe Azure region. Since no information available about the Swiss regions, even the slide from the MS Ignite recording may suggest the availability until May 2020, it’s very unlikely that AVS will be available in Zurich or Geneva before 2021.

Azure VMware Solution Components

You need at least three hosts to get started with the AVS service and you can scale up to 16 hosts per cluster with a SLA of 99.9%. More information about the available node specifications for your region can be found here. At the moment CloudSimple offers the following host types:

CS28 node: CPU:2x 2.2 GHz, total 28 cores, 48 HT. RAM: 256 GB. Storage: 1600 GB NVMe cache, 5760 GB data (All-Flash). Network: 4x25Gbe NIC
CS36 node: CPU 2x 2.3 GHz, total 36 cores, 72 HT. RAM: 512 GB. Storage: 3200 GB NVMe cache 11520 GB data (All-Flash). Network: 4x25Gbe NIC
CS36m node (only option for West Europe): CPU 2x 2.3 GHz, total 36 cores, 72 HT. RAM: 576 GB. Storage: 3200 GB NVMe cache 13360 GB data (All-Flash). Network: 4x25Gbe NIC

I think it’s clear that the used hypervisor is vSphere and that it’s maintained by Microsoft and not by VMware. There is no host-level access, but Microsoft gives you the possibility of a special “just in time” privileges access (root access) feature, which allows to install necessary software bits you might need – for example for 3rd party software integrations.

The storage infrastructure is based on vSAN with an all-flash persistent storage and a NVMe cache storage. More capacity can be made available by adding additional nodes or use Azure offerings which can be added to VMs directly.

Networking and security are based on NSX-T which fully supports micro segmentation.

To offer choice, Microsoft gives you the option to manage and see your AVS VMware infrastructure via vCenter or Azure Resource Manager (ARM). The ARM integration will allow you to create, start, stop and delete virtual machines and is not meant to replace existing VMware tools.

Microsoft support is your single point of contact and CloudSimple contacts VMware if needed.

Connectivity Options

CloudSimple provides the following connectivity options to connect to your AVS region network:

ExpressRoute (high-traffic option – for migration and on-prem to Azure)
Site-to-Site VPN (use your existing network infrastructure)
Point-to-Site VPN (for management purposes only)

Depending on the connectivity option you have different ways to bring your VMs to your AVS private cloud:

vMotion (also possible to use the Fling Cross vCenter Workload Migration Utility)
HCX (available by request, ExpressRoute needed)
3rd party tools (Zerto, Veaam etc.)

How do I get started?

You have to contact your Microsoft account manager or business development manager if would like to know more. But VMware account representatives are also available to support you. If you want to learn more, check https://aka.ms/startavs.

Can I burn my existing Azure Credits?

Yes. Customers with Azure credits can use them through Azure VMware Solution.

Horizon and Workspace ONE Architecture for 250k Users Part 1

May 23, 2019 | App Volumes, Cloud Pod Architecture, Horizon, Unified Endpoint Management, vSAN, vSphere, Workspace ONE

Disclaimer: This article is based on my own thoughts and experience and may not reflect a real-world design for a Horizon/Workspace ONE architecture of this size. The blog series focuses only on the Horizon or Workspace ONE infrastructure part and does not consider other criteria like CPU/RAM usage, IOPS, amount of applications, use cases and so on. Please contact your partner or VMware’s Professional Services Organization (PSO) for a consulting engagement.

To my knowledge there is no Horizon implementation of this size at the moment of writing. This topic, the architecture and the necessary amount of VMs in the data center, was always important to me since I moved from Citrix Consulting to a VMware pre-sales role. I always asked myself how VMware Horizon scales when there are more than only 10’000 users.

250’000 users are the current maximum for VMware Horizon 7.8 and the goal is to figure out how many Horizon infrastructure servers like Connection Servers, App Volumes Managers (AVM), vCenter servers and Unified Access Gateway (UAG) appliances are needed and how many pods should be configured and federated with the Cloud Pod Architecture (CPA) feature.

I will create my own architecture, meaning that I use the sizing and recommendation guides and design a Horizon 7 environment based on my current knowledge, experience and assumption.

After that I’ll feed the Digital Workspace Designer tool with the necessary information and let this tool create an architecture, which I then compare with my design.

Scenario

This is the scenario I defined and will use for the sizing:

Users: 250’000
Data Centers: 1 (to keep it simple)
Internal Users: 248’000
Remote Users: 2’000
Concurrency Internal Users: 80% (198’400 users)
Concurrency Remote Users: 50% (1’000 users)

Horizon Sizing Limits & Recommendations

This article is based on the current release of VMware Horizon 7 with the following sizing limits and recommendations:

Horizon version: 7.8
Max. number of active sessions in a Cloud Pod Architecture pod federation: 250’000
Active connections per pod: 10’000 VMs max for VDI (8’000 tested for instant clones)
Max. number of Connection Servers per pod: 7
Active sessions per Connection Server: 2’000
Max. number of VMs per vCenter: 10’000
Max. connections per UAG: 2’000

The Digital Workspace Designer lists the following Horizon Maximums:

Horizon Maximums Digital Workspace Designer

Please read my short article if you are not familiar with the Horizon Block and Pod Architecture.

Note: The App Volumes sizing limits and recommendations have been updated recently and don’t follow this rule of thumb anymore that an App Volumes Manager only can handle 1’000 sessions. The new recommendations are based on “concurrent logins per second” login rate:

New App Volumes Limits Recommendations

Architecture Comparison VDI

Please find below my decisions and the one made by the Digital Workspace Designer (DWD) tool:

Horizon Item	My Decision	DWD Tool	Notes
Number of Users (concurrent)	199'400	199'400
Number of Pods required	20	20
Number of Desktop Blocks (one per vCenter)	100	100
Number of Management Blocks (one per pod)	20	20
Connection Servers required	100	100
App Volumes Manager Servers	80	202	4+1 AVMs for every 2,500 users
vRealize Operations for Horizon	n/a	22	I have no experience with vROps sizing
Unified Access Gateway required	2	2
vCenter servers (to manage clusters)	20	100	Since Horizon 7.7 there is support for spanning vCenters across multiple pods (bound to the limits of vCenter)

Architecture Comparison RDSH

Please find below my decisions* and the one made by the Digital Workspace Designer (DWD) tool:

Horizon Item	My Decision	DWD Tool	Notes
Number of Users (concurrent)	199'400	199'400
Number of Pods required	20	20
Number of Desktop Blocks (one per vCenter)	20	40	1 block per pod since we are limited by 10k sessions per pod, but only have 333 RDSH per pod
Number of Management Blocks (one per pod)	20	20
Connection Servers required	100	100
App Volumes Manager Servers	14	202	4+1 AVMs for every 2,500 users/logins (in this case RDSH VMs (6'647 RDSH totally))
vRealize Operations for Horizon	n/a	22	I have no experience with vROps sizing
Unified Access Gateway required	2	2
vCenter servers (to manage resource clusters)	4	40	Since Horizon 7.7 there is support for spanning vCenters across multiple pods (bound to the limits of vCenter)

*Max. 30 users per RDSH

Conclusion

VDI

You can see in the table for VDI that I have different numbers for “App Volumes Manager Servers” and “vCenter servers (to manage clusters)”. For the amount of AVM servers I have used the new recommendations which you already saw above. Before Horizon 7.7 the block and pod architecture consisted of one vCenter server per block:

Horizon Pod vCenter tradtitional

That’s why, I assume, the DWD recommends 100 vCenter servers for the resource cluster. In my case I would only use 20 vCenter servers (yes, it increases the failure domain), because Horizon 7.7 and above allows to span one vCenter across multiple pods while respecting the limit of 10’000 VMs per vCenter. So, my assumption is here, even the image below is not showing it, that it should be possible and supported to use one vCenter server per pod:

Horizon Pod Single vCenter

RDSH

If you consult the reference architecture and the recommendation for VMware Horizon you could think that one important information is missing:

The details for a correct sizing and the required architecture for RDSH!

We know that each Horizon pod could handle 10’000 sessions which are 10’000 VDI desktops (VMs) if you use VDI. But for RDSH we need less VMs – in this case only 6’647.

So, the number of pods is not changing because of the limitation “sessions per pod”. But there is no official limitation when it comes to resource blocks per pod and having one connection server for every 2’000 VMs or sessions for VDI, to minimize the impact of a resource block failure. This is not needed here I think. Otherwise you would bloat up the needed Horizon infrastructure servers and this increases operational and maintenance efforts, which obviously also increases the costs.

But, where are the 40 resource blocks of the DWD tool coming from? Is it because the recommendation is to have at least two blocks per pod to minimize the impact of a resource block failure? If yes, then it would make sense, because in my calculation you would have 9’971 RDSH users sessions per pod/block and with the DWD calculation only 4’986 (half) per resource block.

*Update 28/07/2019*
I have been informed by Graeme Gordon from technical marketing that the 40 resources blocks and vCenters are coming from here:

App Volumes vCenters per Pod

I didn’t see that because I expect that we can go higher if it’s a RDSH-only implementation.

App Volumes and RDSH

The biggest difference when we compare the needed architecture for VDI and RDSH is the number of recommended App Volumes Manager servers. Because “concurrent logins at a one per second login rate” for the AVM sizing was not clear to me I asked our technical marketing for clarification and received the following answer:

With RDSH we assign AppStacks to the computer objects rather than to the user. This means the AppStack attachment and filter drive virtualization process happends when the VM is booted. There is still a bit of activity when a user authenticates to the RDS host (assignment validation), but it’s considerably less than the attachment process for a typical VDI user assignment.

Because of this difference, the 1/second/AVM doesn’t really apply for RDSH only implementations.

With this background I’m doing the math with 6’647 logins and neglect the assignment validation activity and this brings me to a number of 4 AVMs only to serve the 6’647 RDS hosts.

Disclaimer

Please be reminded again that these are only calculations to get an idea how many servers/VMs of each Horizon component are needed for a 250k user (~200k CCU) installation. I didn’t consider any disaster recovery requirements and this means that the calculation I have made recommend the least amount of servers required for a VDI- or RDSH-based Horizon implementation.

vSAN Basics for a Virtual Desktop Infrastructure with VMware Horizon

Jan 28, 2019 | ESXi, Horizon, vSAN, vSphere

As an EUC architect you need fundamental knowledge about VMware’s SDDC stack and this time I would like to share some more basics about VMware vSAN for VMware Horizon.

In part 5 of my VCAP7-DTM Design exam series I already posted some YouTube videos about vSAN in case you prefer videos instead of reading. To further proof my vSAN knowledge I decided to take the vSAN Specialist exam which focuses on the version 6.6.

To extend my vSAN skills and to prep myself for this certification I have bought the VMware vSAN 6.7 U1 Deep Dive book which is available on Amazon.

vSAN Basics – Facts and Requirements

Out in the field not every EUC guy has enough sic knowledge about vSAN and I want to provide some facts about this technology here. This is no article about all the background information and detailed stuff you can do with vSAN, but it should help you to get a basic understanding. If you need more details about vSAN I highly recommend the vSAN 6.7 U1 Deep Dive book and the content available on storagehub.vmware.com.

The vSAN cluster requires at least one flash device and capacity device (magnetic or flash)
A minimum of three hosts is required except you go for a two-node configuration (requires a witness appliance)
Each host participating in the vSAN cluster requires a vSAN enabled VMkernel port
Hybrid configurations require a minimum of one 1GbE NIC, 10GbE is recommended by VMware
All-Flash configurations require a minimum of one 10GbE NIC
vSAN can use RAID-1 (mirroring) and RAID5-/6 (erasure coding) for the VM storage policies
RAID-1 is used for performance reasons, erasure coding is used for capacity reasons
Disk groups require one flash device for the cache tier and one or more flash/magnetic device for the capacity tier
There can be only one cache device per disk group
Hybrid configuration – The SSD cache is used for read and write (70/30)
All-Flash configuration – The SSD cache is used 100% as a write cache
Since version 6.6 there is no multicast requirement anymore
vSAN supports IPv4 and IPv6
vSphere HA needs to be disabled before vSAN can be enabled and configured
The raw capacity of a vSAN datastore is calculated by the number of capacity devices multiplied by the number of ESXi hosts (e.g. 5 x 2TB x 6 hosts = 60 TB raw)
Deduplication and compression are only available in all-flash configurations
vSAN stores VM data in objects (VM home, swap, VMDK, snapshots)
The witness does not store any VM specific data, only metadata
vSAN provides data at rest encryption which is a cluster-wide feature
vSAN integrates with CBRC (host memory read cache) which is mostly used for VMware Horizon
By default, the default VM storage policy is assigned to a VM
Each stretched cluster must have its own witness host (no additional vSAN license needed)
Fault domains are mostly described with the term “rack awareness”

vSAN for VMware Horizon

The following information can be found in the VMware Docs for Horizon:

When you use vSAN, Horizon 7 defines virtual machine storage requirements, such as capacity, performance, and availability, in the form of default storage policy profiles, which you can modify. Storage is provisioned and automatically configured according to the assigned policies. The default policies that are created during desktop pool creation depend on the type of pool you create.

This means that Horizon will create storage policies when a desktop pool get created. To get more information I will provision a floating Windows 10 instant clone desktop pool. Before I’m doing that, let’s have a look first at the policies which will appear in vCenter depending on the pool type:

Since I’m going to create a floating instant clone desktop pool I assume that I should see some the storage policies marked in yellow.

Instant Clones

First of all we need to take a quick look again at instant clones. I only cover instant clones since it’s the recommended provisioning method by VMware. As you can learn from this VMware blog post, you can maissvely reduce the time for a desktop to be provisioned (compared to View Composer Linked Clones).

VMware Instant Clones

The big advantage of the instant clone technology (vmFork) is the in-memory cloning technique of a running parent VM.

The following table summarizes the types of VMs used or created during the instant-cloning process:

Instant Cloning VMs
Source: VMWARE HORIZON 7 INSTANT-CLONE DESKTOPS AND RDSH SERVERS

Horizon Default Storage Policies

To add a desktop pool I have created my master image first and took a snapshot of it. In my case the VM is called “dummyVM_blog” and has the “vSAN Default Storage Policy” assigned.

How does it go from here when I create the floating Windows 10 instant clone desktop pool?

Instant Clone Technology

The second step in the process is where the instant-clone engine uses the master VM snapshot to create one template VM. This template VM is linked to the master VM. My template VM automatically got the following storage policy assigned:

The third step is where the replica VM gets created with the usage of the template VM. The replica VM is a thinprovisioned full clone of the internal template VM. The replica VM shares a read disk with the instantclone VMs after they are created. I only have the vSAN datastore available and one replica VM is created per datastore. The replica VM automatically got the following storage policy assigned:

The fourth step involves the snapshot of the replica VM which is used to create one running parent VM per ESXi host per datastore. The parent VM automatically got the following storage policies assigned:

After, the running parent VM is used to create the instant clone, but the instant clone will be linked to the replica VM and not the running parent VM. This means a parent VM can be deleted without affecting the instant clone. The instant clone automatically got the following storage policies assigned:

And the complete stack of VMs with the two-node vSAN cluster in my home lab, without any further datastores, looks like this:

vCenter Resource Pool

Now we know the workflow from a master VM to the instant clone and which default storage policies got created and assigned by VMware Horizon. We only know from the VMware Docs that FTT=1 and one stripe per object is configured and that there isn’t any difference except for the name. I checked all storage policies in the GUI again and indeed they are all exactly the same. Note this:

Once these policies are created for the virtual machines, they will never be changed by Horizon 7

Even I didn’t use linked clones with a persistent disk the storage policy PERSISTENT_DISK_<guid> gets created. With instant clones there is no option for a persistent disk yet (you have to use App Volumes with writable volumes), but I think that this will come in the future for instant clones and then we also don’t need View Composer anymore. 🙂

App Volumes Caveat

Don’t forget this caveat for App Volumes when using a vSAN stretched cluster.

New Supermicro Home Lab

Nov 14, 2018 | ESXi, homelab, supermicro, vSAN, vSphere

For a few years I ve been using three Intel NUC Skull Canyon (NUC6i7KYK) mini PCs for my home lab. Each NUC is equipped with the following:

6th Gen Intel i7-6770HQ processor with Intel Iris Pro graphics
2x 16GB Kingston Value RAM DDR4-2133
2x 500GB Samsung 960 EVO NVMe M.2
1x Transcend JetFlash 710S USB boot device

These small computers were nice in terms of space, but are limited to 32GB RAM, have only 1 network interface and no separate management interface.

This was enough and acceptable when I worked with XenServer, used local storage and just had to validate XenDesktop/XenApp configurations and designs during my time as Citrix consultant.

When I started to replace XenServer with ESXi and created a 3-node vSAN cluster for my first Horizon 7 environment, all was running fine at the beginning. But after while I had strange issues doing vMotions, OS installations, VCSA or ESXi upgrades.

So, I thought it’s time build a “real” home lab and was looking for ideas. After doing some research and talking to my colleague Erik Bussink, it was clear for me that I have to build my computing nodes based on a Supermicro mainboard. As you may know, the Skull Canyons are not that cheap and therefore I will continue using them for my domain controller VMs, vSAN witness, vCenter Server appliance etc.

Yes, my new home lab is going to to be a 2-node vSAN cluster.

Motherboard

I found two Supermicro X11SPM-TF motherboards for a reduced price, because people ordered and never used them. This was my chance and a “sign” that I have to buy my stuff for the new home lab NOW! Let’s pretend it’s my Christmas gift. 😀

The key features for me?

768GB RAM limit (not that I would need that much, but better than 32GB)
2x 10GbE RJ45 Ports
12x SATA3
1x M.2
IPMI (Intelligent Platform Management Interface)

Chassis

I went for the Fractal Design Node 804 because it offers me space for the hardware and cooling. And I like the square form factor which allows me to stack them.

CPU

I need some number of cores in my system to run tests and have enough performance in general. I will mainly run Workspace ONE and Horizon stuff (multi-site architectures) in my lab, but this will change in the future. So I have chosen the 8-core Intel Xeon Silver 4110 Processor with 2.10 GHz.

Memory

RAM was always a limiting factor with my NUCs. I will reuse two of them and start with two 32GB 2666 MHz Kingston Server Premier modules for each ESXi host (total 64GB per host). If memory prices are reducing and I would need more capacity, I easily can expand my system.

Boot Device

Samsung 860 EVO Basic 250GB which is way too much for ESXi, but the price is low and I could use the disk for something else (e.g. for a new PC) if needed.

Caching Device for vSAN

I will remove one Samsung 960 EVO 500GB M.2 of each NUC and use them for the vSAN caching tier. Both NUCs will have still one 960 EVO 500 left to be used as local storage.