Multi-Cloud Load Balancing and Autoscaling with NSX Advanced Load Balancer (formerly Avi Networks)

Multi-Cloud Load Balancing and Autoscaling with NSX Advanced Load Balancer (formerly Avi Networks)

Do you want to build your private cloud like a hyperscaler is doing it? You know that VMware Cloud Foundation is becoming the new vSphere, but still wonder how you can implement software-defined load balancing (LB) or application services and features like autoscaling or predictive scaling? Then this article about multi-cloud load balancing and autoscaling with NSX Advanced Load Balancer aka Avi Networks is for you!

My Experience with a Legacy ADC

A few years ago, I was working on the customer side for an insurance company in Switzerland as a Citrix System Engineer. My daily tasks included the maintenance and operation of the Citrix environment, which included physical and virtual Citrix NetScaler Application Delivery Controller (ADC) appliances. The networking team owned a few hardware-based appliances (NetScaler SDX) with integrated virtualization capability (XenServer as hypervisor) to host multiple virtual NetScaler (VPX) instances.

The networking team had their dedicated NetScaler VPX instances (for LDAP and HTTP load balancing mostly) and deployed my appliances after I filed a change request. Today, you would call this multi-tenancy. For a Citrix architecture is was best practices to have one high availability (HA) pair for the internal and one HA pair for the external (DMZ) network access. A HA pair was running in a active/passive mode and I had to maintain the same setup for the test environment as well.

Since my virtual VPX appliances were hosted on the physical SDX appliance, I always relied on the network engineers, if I needed more resources (CPU, RAM, SSL, throughput) chips allocated to my virtual instances. Before I could upgrade to a specific firmware version, I also had to wait until they upgraded the physical NetScaler appliances and approved my change request. This meant, we had to plan changes and maintenance windows together and had to cross fingers, that their upgrade went well, that we could upgrade all our appliances after.

NetScaler SDX

It was also possible to download a VPX appliance, which could run on top of VMware vSphere. To be more independent, I decided to install four new VPX appliances (for the production and test environment) on vSphere and migrated the configuration from the appliances running on the physical SDX appliance.

Another experience I had with load balancers was when I started to work for Citrix as a consultant in Central Europe and had to perform a migration of physical NetScaler MPX appliances, which had no integrated virtualization capability. I believe I had two sites with each two of these powerful MPX appliances for tens of thousands of users. Beside the regular load balancing configuration for some of the Citrix components, I also had to configure Global Server Load Balancing (GSLB) in active/passive mode for the two sites.

NetScaler GSLB Active Passive

There were so many more features available (e.g. Web Application Firewall, Content Switching, Caching, Intrusion Detection), but I never used anything else than the NetScaler Universal Gateway for the remote access to the virtual desktop infrastructure (VDI), load balancing, HTTP to HTTPS redirections and GSLB. In all scenarios I had a HA pair where one instance was idling and doing nothing. And the active unit was in average not utilized more than 15-20%. It was common to install/buy too large or powerful instances/licenses, because you wanted to be on the safe side and have enough capacity to terminate all your SSL sessions and so on.

It (load balancing) was about distributing network traffic across multiple servers by spreading the requests and work evenly, and do add some intelligence (health monitoring) in case an application server or a service would fail or be unavailable for any reason. If one more application server was needed, I ordered a new Windows Server, installed and configured the Citrix components and added the necessary load balancing configuration on the NetScaler. These were all manual tasks. The same work has been done by the network engineers when the application team requested a new application server, which then had to be added to the load balancing configuration on their NetScaler appliances.

This was my personal experience from 2017. Since then applications became more complex and distributed. The analysts and market are focusing on containerized and portable apps running and more and more in multiple clouds. The prediction is also that the future is multi-cloud.

Multiple Clouds vs. Multi-Cloud

There are different definitions and understandings out there what multi-cloud means. In my understanding, using a private cloud, AWS, Microsoft Office 365 and Azure are a typical setup with multiple clouds. There are simple scenarios where you migrate workloads from the private to the public cloud (e.g. Azure) or having applications with services lying on the private on the public cloud. The latter would be an example of a hybrid cloud architecture.

The reasons for which services and resources are needed or distributed on multiple clouds (on-prem, Azure, AWS, GCP etc.) are various:

  • Avoid dependence on only one cloud provider
  • Consume different specific services that are not provided elsewhere
  • Optimize costs for different workloads and services
  • React to price changes by the providers

That is why we are seeing also the trend to break up big legacy applications (monoliths) in smaller pieces (segments), which is a best practice and design principle today. The goal is to move to a loosely coupled and more service-oriented architecture. This provides greater agility, more flexibility and easier scalability, because of less inter-dependencies.

And, if we take the second example from the list before, a segmented application is much easier to run in different clouds (portability). Running one application over multiple clouds is in my understanding the right definition of multi-cloud.

Multiple Clouds versus Multi-Cloud

Let’s assume that most probably all the four reasons above apply to larger enterprises. If we take another angle, we can define some business and technical requirements for multi-cloud:

  • Application or services need to be cloud vendor-agnostic
  • Provide or abstract control and management interface of multiple clouds
  • Support application portability/relocation between clouds
  • Combine IaaS and services from different clouds
  • Possibility to deploy components of applications in multiple clouds
  • (Cloud) Broker service needed
  • Policy and governance over multiple clouds
  • Network connectivity for migration scenarios with partially modernized applications
  • Automated procedures for deployments
  • Application monitoring over different clouds
  • Costs management
  • Lifecycle management of deployed applications in multiple clouds
  • Self-adaption and auto scaling features
  • Large team with various expertises needed

How can you deliver and manage the different applications services like load balancing, web application firewalling, analytics, automation and security over multiple clouds?

Another important question would be, how you want to manage the deployment on the various clouds. But cloud management or a cloud management platform is something for another article. 🙂

The requirements for the developers, operations and the business are very complex and it’s a long list (see above).

It is important, that you understand, based on the requirements for multi-cloud, that it is mandatory to implement a modern solution for your modernized application architectures. Enterprises have become more application-centric and everyone is talking about continious integration, continuous delivery and DevOps practices to automate operation and deployment tasks. A modern solution implicits a software-defined approach. Otherwise you won’t be able to be agile, adapt to changes and meet future requirements.

My past experience with Citrix’ NetScaler is a typical example that “virtualized” and “software-defined” are not the same thing. And this is very important if we want to have a future-ready solution. If we look at VMware’s software-defined data center (SDDC), beside the virtual compute, also includes software-defined storage and networking. Part of the software-defined networking portfolio is “NSX Advanced Load Balancer“, the software-defined application services platform, which was also known as “Avi Networks” before VMware bought them in June 2019.

Unlike a virtualized load-balancing appliance, a software-defined
application services platform separates the data and control planes
to deliver application services beyond load balancing, real-time
application analytics, security and monitoring, predictive autoscaling,
and end-to-end automation for Transport (Layer 4) to
Application (Layer 7) layer services. The platform supports multicloud
environments and provides software-defined application
services with infrastructure-agnostic deployments on bare metal
servers, virtual machines (VMs), and containers, in on-premises
data centers and private/public clouds.

Autoscaling became famous with AWS as it monitors your applications and automatically adjusts capacity to maintain availability and performance at the lowest possible costs. It automatically adds or removes application servers (e.g. EC2 instances), load balancers, applies the right network configuration and so on.

Can you achieve the same for your on-premises infrastructure with VMware? Yes.

Is there even a solution which can serve both worlds – on-prem and cloud? Yes.

And what about predictive scaling with real-time insights? Yes.

NSX Advanced Load Balancer (NSX ALB)

Why did VMware buy Avi? Because it follows the same architecture principles like NSX: A distributed platform with a separate control and data plane built on software-defined principles for any cloud.

Avi High Level Architecture

Traditional ADCs or load balancers are mostly configured in active/standby pairs, no matter if physical or virtual. Typically you would see around 15% utilization on the active node where the secondary standby node is just idling and doing nothing. Each pair is its own island of static capacity which shares the management, control and data plane.

You have to decide where to place the virtual IP (VIP) and how much you want to overprovision the physical or virtual appliances, because there is no capacity pooling available. This leads to operational complexity, especially when you have hundreds of such HA pairs running in different clouds. Therefore, legacy and virtualized ADCs are not the ideal choice for a multi-cloud architecture. Let’s check NSX ALB’s architecture:

Control Plane – This is the brain (single point of management) of the whole platform that can be spun up in your on-prem environment or in the cloud (also available as a managed SaaS offering), typically as a three-node cluster. Within this cluster, all configuration is done, this also where the policies reside and the decisions are made. It is the controller’s duty to place virtual services on SEs to load balance new applications or increase the capacity of running applications.

The control plane comprises the three pillars that deliver the key capabilities of the Avi platform:

  • Multi-Cloud – Consistent experience for any cloud, no lock-in
  • Intelligence – The machine learning based analytics engine enables application performance monitoring, troubleshooting, and operational insights (gathered by the SEs)
  • Automation – Elastic and predictive auto scaling & self-service without over-provisioning through a complete set of REST APIs

Data Plane –  The Service Engines (SEs) handle all data plane operations by receiving and executing instructions from the controller. The SEs perform load balancing and all client- and server-facing network interactions. It collects real-time application telemetry from application traffic flows. 

As already mentioned, NSX ALB can be deployed in multiple cloud environments like VMware vCenter, Amazon Web Services, Microsoft Azure, Google Cloud Platform, Oracle Cloud, IBM Cloud, VMC on AWS, Nutanix, OpenStack or bare-metal.

Use Cases

Most customers deploy Avi because of:

  • Load Balancer refresh
  • Multi-Cloud initiatives
  • Security including WAF, DDoS attack mitigation, achieve compliance (GDPR, PCI, HIPAA)
  • Container ingress (integrates via REST APIs with K8s ecosystems like GKE, PKS (TKGI), OpenShift, EKS, AKS, TKG)

Advanced Kubernetes Ingress Controller Avi Networks

  • Virtual Desktop Infrastructure (Citrix, VMware Horizon)

Consistent Application Services Platform (Features)

Avi/NSX ALB is an enterpise-grade solution. So, everything you would expect from a traditional ADC (e.g. F5), layer 4 to layer 7 services, SSL, DDoS, WAF etc. is built-in without the need for a special license edition. There is also no NSX license requirement even the product name would suggest it. It can be deployed as a standalone load balancer or as an integrated solution with other VMware products (e.g. VCF, vRA/vRO, Horizon, Tanzu etc.).

Avi Networks Features

Below is a list with the core features:

  • Enterprise-class load balancing – SSL termination, default gateway, GSLB, DNS, and other L4-L7 services
  • Multi-cloud load balancing – Intelligent traffic routing across multiple sites and across private or public clouds
  • Application performance monitoring – Monitor performance and record and replay network events like a Network DVR
  • Predictive autoscaling Application and load balancer scaling based on real-time traffic patterns
  • Self-service – For app developers with REST APIs to build services into applications
  • Cloud connectors – VMware Cloud on AWS, SDN/NFV controllers, OpenStack, AWS, GCP, Azure, Linux Server Cloud, OpenShift/Kubernetes
  • Distributed application security fabric – Granular app insights from distributed service proxies to secure web apps in real time
  • SSO / Client Authentication – SAML 2.0 authentication for back-end HTTP applications
  • Automation and programmability – REST API based solution for accelerated application delivery; extending automation from networking to developers
  • Application Analytics – Real-time telemetry from a distributed load balancing fabric that delivers millions of data points in real time

Load Balancing for VMware Horizon

NSX ALB can be configured for load balancing in VMware Horizon deployments, where you place SEs in front of Unified Access Gateways (UAG) or Connection Servers (CS) as required.

Avi Horizon High Level Architecture

For a multi-site architecture you can also configure GSLB if needed. With GSLB, access to resources is controlled with DNS queries and health checking.

Note: If you are using the Horizon Universal Broker, the cloud-based brokering service, there is no need for GSLB, because the Universal Broker can orchestrate connections from a higher level based on different policies.

Automation

With NSX Advanced Load Balancer there are two parts when we talk about automation. One part is about infrastructure automation, where the controller talks to the ecosystem like a vCenter, AWS or Azure to orchestrate the Service Engine. So, when you configure a new VIP, the controller would talk to vCenter to spin up a VM, put it in the right portgroup, connect the front and the back-end, download the policy and service engine, and starts receiveing traffic.

The second piece of automation focuses more on the operational automation which is through the REST API (the UI and CLI don’t offer all the configuration, 100% can be done via REST API). But, on top of that you can also run Ansible playbooks, Terraform templates, Go and Python SDKs, have integrations with Splunk or other tools like vRealize Automation. This is the built-in automation in the product.

Avi Networks Automation

VMworld 2020 Sessions

This year VMworld is going to be for free and virtual. Take this chance and register yourself and learn more about Avi aka NSX ALB:

  1. Making Your Private Cloud Network Run Like a Public Cloud – Part 2 [VCNC2918]
  2. Modern Apps and Containers: Networking and Security [VCNC2920]
  3. Prepare for the New Normal of Work from Anywhere [VCNC2919]

Expectations and Current Approaches

There is the general understanding and need for hybrid or multi-cloud architectures. Different people will tell different stories and give different advices. The result are different architectures and different approaches. Some people will tell you, that you can use a cloud serially, so moving from one cloud to another. Or, simultaneously, when using different services from different clouds.

My last article focused on hybrid cloud, the architecture with some services lying on the private infrastructure, while other services are hosted on a public cloud. A public cloud providers tells you, that you can buy all services from them and tries to give you a better discount than the competition (to avoid multiple clouds). Enterprises see the need for multiple (public) clouds to avoid a vendor lock-in instead of going all-in with just one of them.

VMware is about multi-cloud and workload mobility, with the vision, that their VCF stack is running everywhere in the future. Now, some people would now say that this is also a vendor lock-in. Depending on your strategy and technology choices and preferences (e.g. databases, AI/ML services, virtual desktops), you have to decide somewhen which (cloud) vendor, approach and operation model is the right one for you.

It may not true for every large environment, but if you go for multiple clouds, multiple technologies, management and security consoles, architecture and so on, you’ll spend a lot of time and money on engineering and keeping your environment “integrated” and functional.

VMware offers you choice. The choice to run your workloads today and tomorrow wherever you want.

If you have the same vision and strategy like VMware, then you are looking for solutions which run in or on top of every cloud. Because of that it’s very important to understand the different between multiple clouds and multi-cloud.

In this case, NSX ALB brings you multi-cloud load balancing and auto scaling features for any cloud and for multi-cloud enabled applications and services.

Don’t forget: Some people are also saying,  that multi-cloud is not needed and doesn’t exist in reality. Nobody is saying multi-cloud is a piece of cake, but VMware can definitely help you to abstract this complexity. And part of this abstraction can be handled with vRealize Automation for example, which can act as a cloud broker to deploy your application and services.

 

VMware Multi-Cloud and Hyperscale Computing

VMware Multi-Cloud and Hyperscale Computing

In my previous article Cross-Cloud Mobility with VMware HCX I already very briefly touched VMware’s hybrid and multi-cloud vision and strategy. I mentioned, that VMware is coming from the on-premises world if you compare them with AWS, Azure or Google, but have the same “consistent infrastructure with consistent operations” messaging. And that the difference would be, that VMware is not only hardware-agnostic, but even cloud-agnostic. To abstract the technology format and infrastructure in the public cloud, their idea is to run VMware Cloud Foundation (VCF) everywhere (e.g. Azure VMware Solution), on-premises on top of any hardware and in the cloud on any global infrastructure from any hyperscaler like AWS, Azure, Google, Oracle, IBM, Alibaba. Or you can run your workloads in a VMware cloud provider’s cloud based on VCF. That’s the VMware multi-cloud.

The goal of this article is not compare any features from different vendors and products, but to give you a better idea why multi-cloud is becoming a strategic priority for most enterprises and why VMware could be right partner for your journey to the cloud.

To get started, let’s get an understanding what the three big hyperscalers are doing is when it comes to a hybrid or multi-cloud.

Microsoft

To bring Azure services to your data center and to benefit from a hybrid cloud approach, you would probably go for Azure Stack to run virtualized applications on-premises. Their goal is to build consistent experiences in the cloud and at the edge, even for scenarios where you have no internet connection. This would be by VMware’s definition a typical hybrid cloud architecture.

Multi-cloud refers to the use of multiple public cloud service providers in a multi-cloud architecture, whereas hybrid cloud describes the use of public cloud in conjunction with private cloud. In a hybrid cloud environment, specific applications leverage both the private and public clouds to operate. In a multi-cloud environment, two or more public cloud vendors provide a variety of cloud-based services to a business.

With the announcement of Azure Arc at MS Ignite 2019, Microsoft introduced a new product, which “simplifies complex and distributed environments across on-premises, edge and multi-cloud“. Beside the fact that you can run Azure data services anywhere, it gives you the possibility to govern and secure your Windows servers, Linux servers and Kubernetes (K8s) clusters across different clouds. Arc can also deploy and manage K8s applications consistently (from source control).

Azure Arc InfographicYou could summarize it like this, that Microsoft is bringing Azure infrastructure and services to any infrastructure. It’s not necessary to understand the technical details of Azure Stack and Azure Arc. More important is the messaging and the strategy. It’s about managing and securing Windows/Linux servers, virtual machines and K8s clusters everywhere and this with their Azure Resource Manager (ARM). Arc ensures that the right configurations and policies are in place to fulfill governance requirements across clouds. Run your workloads where you need it and where it makes sense, even it isn’t Azure.

Google Anthos

Google open-sourced their own implementation of containers to the Linux kernel in about 2006 or 2007. It was called cgroups, which stands for control groups. Docker appeared in 2013 and provided some nice tooling for containers. Over the next years, Microservices were used more often to divide monoliths into different pieces and services. Because of the growing numbers of containers, Google saw the need to make this technology easy to manage and orchestrate for everyone. This was six years ago when they released Kubernetes.

By the way, two of the three Kubernetes founders, namely Joe Beda and Craig McLuckie, are working for VMware since their company Heptio has been acquired by VMware in November 2018.

Today, Kubernetes is the standard way to run containers at scale.

We know by now that the future is hybrid or even multi-cloud, and not public cloud only. Also Google realized that years ago. Besides that, a lot of enterprises made the experience that moving to the cloud and re-engineering the whole application at the same time mostly fail. This means, that moving applications from your on-premises data center, refactoring the application at the same time and run it in the public cloud, is not that easy.

Why isn’t it easy? Because you are re-engineering the whole application, have to take care of other application and network dependencies, think about security, governance and have to train your staff to cope with all the new management consoles and processes.

Google’s answer and approach here is to modernize applications on-premises and then move them to the cloud after the modernization happened. They say that you need a platform, that runs in the cloud and in your data center. A platform, that runs consistently across different environments – same technology, same tools and policies everywhere.

This platform is called Google Anthos. Anthos is 100% software-defined and (hardware) vendor-agnostic. To deliver their desired developer experience on-prem as well, they rely on VMware. This is GKE running on-prem on top of vSphere:

Anthos vSphere on-prem

Amazon Web Services

The last solution I would like to mention is AWS Outposts, which is a fully managed service that extends their AWS infrastructure, services and tools to any data center for a “truly consistent hybrid experience”. What are the AWS services running on Outposts?

  • Containers (EKS)
  • Compute (EC2)
  • Storage (EBS)
  • Databases (Amazon RDS)
  • Data Analytics (Amazon EMR)
  • Different tools and APIs

AWS Outposts are delivered as an industry-standard 42U rack. The Outpost rack is 80 inches (203.2cm) tall, 24 inches (60.96cm) wide, and 48 inches (121.92cm) deep. Inside we have hosts, switches, a network patch panel, a power shelf, and blank panels. It has redundant active components including network switches and hot spare hosts.

If you visit the Outposts website, you’ll find the following information:

Coming soon in 2020, a VMware variant of AWS Outposts will be available. VMware Cloud on AWS Outposts delivers a fully managed VMware Software-Defined Data Center (SDDC) running on AWS Outposts infrastructure on premises.

VMC on AWS Outposts is for customers, who want to use the same VMware software conventions and control plane as they have been using for years. It can be seen as an extension from the regular VMC on AWS offering which is now made available on-premises (on top of the AWS Outposts infrastructure) for a hybrid approach.

VMC on AWS Outposts

What do all these options have in common? It is always about consistent infrastructure with consistent operations. To have one platform in the cloud and on-premises in your data center or at the edge. Most of today’s hybrid cloud strategies rely on the facts, that migrations to the cloud are not easy, fail a lot and so it’s clear why we still have 90% of all workloads running on-premises. We are going to have many million containers more in the future, which need to be orchestrated with Kubernetes, but virtual machines are not just disappearing or being replaced tomorrow.

My conclusion here is, that every hyperscaler is seeing cloud-native in our (near) future and wants to provide their services in the cloud and on-prem. That customer can build their new applications with a service-oriented architecture or partially modernize existing monoliths (big legacy applications) on the same technology stack.

Consistent Infrastructure & Consistent Operations

All hyperscalers mention as well, that you have to take care of different management and security consoles, skills set and tools in general. Except Microsoft with Azure Arc, nobody else is having a “real” multi-cloud solution or platform. I want to highlight, that even Azure Arc is only here for some servers, Kubernetes clusters and takes care of governance.

Let’s assume you have a hybrid cloud setup in place. Your current project requirements tell you to develop new applications in the Google Cloud using GKE. That’s fine. Your current on-premises data centers run with VMware vSphere for virtualization. Tomorrow, you have to think about edge computing for specific use cases where AI and ML-based workloads are involved. Then you decide to go for Azure and create a hybrid architecture with Azure Stack and Arc. Now you are using two different public cloud providers, one with their specific hybrid cloud offering and also VMware vSphere on-premises.

What are you going to do now? How do you manage and secure all these different clouds and technologies? Or do you think about migrating all the application workloads from on-prem to GCP and Azure? Or do you start with Anthos now for other use cases and applications? Maybe you decide later to move away from VMware and evacuate the VMware-based private cloud to any hyperscaler? Is it even possible to do that? If yes, how long would this technology change and migration take?

Let’s assume for this exercise, that this would be a feasable option with an acceptable timeframe. How are you going to manage the different servers, applications, dependencies and secure everything at the same time? How can you manage and provision infrastructure in an easy and efficient way? What about cost control? What happens if you don’t see Azure as strategic anymore and want to move to AWS tomorrow? Then you figure out, that cloud is more expensive than you thought and experience yourself why only 10% of all workloads are running in the public cloud today.

Multi-Cloud Reality

I think people can pretty easy handle an infrastructure which runs VMware on-premises and have maximum one public cloud only – a hybrid cloud architecture. If we are talking about a greenfield scenario where you could start from scratch and choose AWS including AWS Outposts, because you think it’s best for you and matches all the requirements, go for it. You know what is right for you.

But I believe, and this is also what I see with larger customers, the current reality is hybrid and the future is multi-cloud.

VMware Multi-Cloud Strategy

And a multi-cloud environment is a totally different game to manage. What is the VMware multi-cloud strategy exactly and why is it different?

Consistent VMware Multi-Cloud

VMware’s approach is always to abstract complexity. This doesn’t mean that everything is getting less complex, but you will get the right platform and tooling to deal with this complexity.

A decade ago, abstracting meant providing a hypervisor (vSphere) for any hardware (being vendor-agnostic). After that we had software-defined storage (vSAN) followed software-defined networking (NSX). Beside these three major software pieces, we also have the vRealize suite, which is mainly known for products like vRealize Automation and vRealize Operations. The technology stack consisting of vSphere, vSAN, NSX, vRealize and some management components from the software-defined data center and is called VMware Cloud FoundationA technology stack that allows you to experience the ease of public cloud in your data center. Again, if wanted and required, you can run this stack on top of any hyperscaler like AWS, Azure, Google Cloud, Alibaba Cloud, Oracle Cloud or IBM.

VMware Cloud Foundation

It’s a platform which can deliver services as you would expect in the public cloud. The vRealize suite can help you to automatically provision virtual machines and containers including the right network and storage (any vSphere-based cloud or cloud-native on AWS, GCP, Azure or Alibaba). Build your own templates or blueprints (Infrastructure as Code) to deliver services IaaS, DBaaS, CaaS, DaaS, FaaS, PaaS, SaaS and DRaaS, which can be ordered and consumed by your users or your IT. Put a price tag behind any service or workload you deploy, and include your public cloud spending as well (e.g. with CloudHealth) in this calculation.

You want to deliver vGPU enabled virtual machines or containers? Also possible with vSphere. Modern AI/ML based applications need compute acceleration to handle large and complex computation. vSphere Bitfusion allows you to access GPUs in a virtualized environment over the network (ethernet). Bitfusion works across any cloud and environment and can be accessed from any workload from any network. This topic gets very interesting if we talk about edge computing for example.

vSphere Bitfusion

Modern applications obviously demand a modern infrastructure. An infrastructure with a hybrid or multi-cloud architecture. With that you are facing the challenge of maintaining control and visibility over a growing number of environments. In such a modern environment, how do you automate configuration and management? What about networking and security policies applied at a cluster level? How you handle identity and access management (IAM)? Any clue about backup and restore? And what would be your approach for cost management in a multi-cloud world?

Modern Applications Challenges

To improve the IT ops and developer experience, VMware announced the Tanzu portfolio including something they call the Tanzu Kubernetes Grid (TKG). The promise of TKG is to provide developers a consistent and on-demand access to infrastructure across clouds and is considered to be the enterprise-ready Kubernetes runtime.

Since vSphere 7, TKG has been embedded into the control plane vSphere 7 with Kubernetes as a service. Finally, as Kubernetes is natively integrated into the hypervisor, we have a converged platform for VMs and containers. IT ops now can see and manage Kubernetes objects (e.g. pods) from the vSphere client and developers use the Kubernetes APIs to access the SDDC infrastructure.

There are different ways to consume TKG beside “vSphere 7 with Kubernetes“. TKG is a consistent and upstream compatible Kubernetes runtime with preintegrated and validated components, that also runs in any public cloud or edge environments.

Tanzu Kubernetes Grid

If you have to run Kubernetes clusters natively on Azure, AWS, Google and on vSphere on-premises, how would you manage IAM, lifecycle, policies, visibility, compliance and security? How would you manage any new or existing clusters?

Tanzu Mission Control

Here, VMware’s solution would be Tanzu Mission Control (TMC). A centralized management platform (operated by VMware as SaaS) for all your clusters in any cloud. TMC allows you to provision TKG workload clusters to your environment of choice and manage the lifecycle of each cluster via TMC. To date, the supported deployments are in vSphere and AWS EC2 accounts. The deployment on Azure is coming very soon.

Existing Kubernetes clusters from any vendor such as EKS, AKS, GKE or OpenShift can be attached to TMC. As long as you are maintaining CNCF conformant clusters, you can attach them to TMC so that you can manage all of them centrally.

The Tanzu portfolio is much bigger and includes more than TKG and TMC, which only address the “where and how to run Kubernetes” and “how to deploy and manage Kubernetes”. Tanzu has other solutions like an application catalog, build service, application service (previously Pivotal Cloud Foundry) and observability (monitoring and metrics) for example.

VMware Tanzu Products

And this Tanzu products can be complemented with cloud-scale networking solutions like an application delivery controller (ADC) or software-defined WAN (SD-WAN). To deliver the “public cloud experience” to developers for any infrastructure, we need to provide agility. From an infrastructure perspective we’ll find VMware Cloud Foundation and from application or developer perspective we learned that Tanzu covers that.

For a distributed application architecture, you also need a software-defined ADC architecture that is fully distributed, auto scalable and provides real-time analytics and security for VMs or containers. VMware’s NSX Advanced Load Balancer (formerly known as Avi Networks) runs on AWS, GCP, Azure, OpenStack and VMware and has a rich feature set:

AVI Networks Features

Hypervisor versus Public Cloud

What I am trying to say here, is, that cloud-native at scale requires much more than containers only. While hypervisors are obviously not disappearing and getting replaced by containers from the public cloud very soon, they will co-exist and therefore it is very important to implement solutions which can be used everywhere. If you can ignore the cost factor for a moment, probably the best solution would be using the exact same technology stack and tools for all the clouds your workloads are running on.

You need to rely on a partner and solution portfolio that could address or solve anything (or almost anything) you are building in your IT landscape. As I already said, VCF and Tanzu are just a few pieces of the big puzzle. Important would be an end-to-end approach from any layer or perspective.

Therefore, I believe, VMware is very relevant and very well-positioned to support your journey to the multi-cloud.

The application you migrate or modernize need to be accessed by your users in a simple and secure way. This would lead us for example to the next topic, where we could start a discussion about the digital workspace or end-user computing (EUC).

Talking about EUC and the future-ready workplace would involve other IT initiatives like hybrid or multi-cloud, application modernization, data center and cloud networking, workspace security, network security and so on. A discussion which would touch all strategic pillars VMware defined and presented since VMworld 2019.

VMware 5 Strategic Pillars

If your goal is also to remove silos, provide a better user and admin experience, and this in a secure way over any cloud, then I would say that VMware’s unique platform approach is the best option you’ll find on the market.

And since VMware can and will co-exist with the hyperscalers, and even run on top of all them, I would consider to talk about the “big four” and not “big three” hyperscalers from now on.

Cross-Cloud Mobility with VMware HCX

Cross-Cloud Mobility with VMware HCX

Update 10th September 2020: vSphere 7.0 (VDS 7.0) and NSX-T 3.0.1+ are supported since the HCX R143 release which has been made available on September 8, 2020

https://docs.vmware.com/en/VMware-HCX/services/rn/VMware-HCX-Release-Notes.html 

Most people think that VMware HCX is a only migration tool that helps you moving workloads to a vSphere based cloud like VMware Cloud on AWS, Azure VMware Solution or Google Cloud VMware Engine. But it can do so much more for you than only application or workload migrations. HCX is also designed for workload rebalancing and business continuity across data centers or VMware clouds. Why I say “across VMware clouds” and not only “clouds”?

A few years ago everyone thought or said that customers will move all their workloads to the public cloud and the majority of them don’t need local data centers anymore. But we all know that this perception has changed and that the future cloud operation is model hybrid.

A hybrid cloud environment leverages both the private and public clouds to operate. A multi-cloud environment includes two or more public cloud vendors which provide cloud-based services to a business that may or may not have a private cloud. A hybrid cloud environment might also be a multi-cloud environment.

We all know that the past perception was an illusion and we didn’t have a clue where the hyperscalers like AWS, Azure or GCP would be in the next 5 or 7 years. And I believe that even the AWS and Microsoft didn’t expect what is going to happen since we observed interesting shifts in the last few years.

Amazon Web Services (AWS) has been launched 14 years ago (2006) to provide web services from the cloud. At AWS re:Invent 2018 the CEO Andy Jassy announced AWS Outposts because their customers have been asking for an AWS option on-premises. In the end, Outpost is just an extension of an AWS region into the own data center, where you can launch EC2 instances or Amazon EBS volumes locally. AWS already had some hybrid services available (like Storage Gateway) but here we talk about infrastructure and making your own data center part of the AWS Global Infrastructure.

Microsoft Azure was released in 2010 and the first technical preview of Azure Stack has been announced in 2016. So, Microsoft also realized that the future cloud model is a hybrid approach “that provides consistency across private, hosted and public clouds”.

Google Cloud Platform (GCP) offers cloud computing services since 2008. Eleven years later (in 2019) Google introduced Anthos that you can “run an app anywhere –  simply, flexibly and securely”.

All the hyperscalers changed their cloud model to provide customers a consistent infrastructure with consistent operations and management as we understand now.

VMware is coming from the other end of this hybrid world and has the same overall goal or vision to make a hybrid or multi-cloud reality. But with one very important difference. VMware helps you to abstract the complexity of a hybrid environment and gives you the choice to run your workloads in any cloud infrastructure with a cloud-agnostic approach.

As organizations try to migrate their workloads to the public, they face multiple challenges and barriers:

  • How can I migrate my workload to the public cloud?
  • How long does it take to migrate?
  • What about application downtime?
  • Which migration options do I have?
  • Which cloud is the best destination for which workloads?
  • Do I need to refactor or develop some applications?
  • Can I do a lift and shift migration and modernize the application later?
  • How can I consistently deploy workloads and services for my multi-cloud?
  • How can I operate and monitor (visibility and observability) all the different clouds?
  • What if tomorrow one the public cloud provider is not strategic anymore? How can I move my workloads away?
  • How can I control costs over all clouds?
  • How can I maintain security?
  • What about the current tools and 3rd party software I am using now?
  • What if I want to migrate VMs back from the public cloud?
  • What if I want to move away/back somewhen from a specific cloud provider?

In summary, the challenges with a hybrid cloud are about costs, complexity, tooling and skills. Each public cloud added to your current on-premises infrastructure is in fact a new silo. If you have the extra money and time and don’t need consistent infrastructures and consistent operations and management, you’ll accept the fact that you have a heterogeneous environment with different technology formats, skill/tool sets, operational inconsistencies and security controls.

If you are interested in a more consistent platform then you should build a more unified hybrid cloud. Unified means that you provide consistent operations with the existing skills and tools (e.g. vCenter, vRealize Automation, vRealize Operations) and the same policies and security configuration over all clouds – your data center, public cloud or at the edge.

To provide such a cloud agnostic platform you need to abstract the technology format and infrastructure in the public cloud. This is why VMware built the VMware Cloud Foundation (VCF) platform that delivers a set of software-defined services for compute, storage, networking, security and cloud management for any cloud.

VMC on AWS, Azure VMware Solution, Google Cloud VMware Engine and all the other hybrid cloud offerings (IBM, Oracle, Alibaba Cloud, VCPP) are based on VMware Cloud Foundation. This is the underlying technology stack you would need if your goal is to be independent and to achieve workload mobility between clouds. With this important basic understanding we can take a closer look at VMware HCX.

VMware HCX Use Cases

HCX provides an any-to-any vSphere workload mobility service without requiring retrofit as we use the same technology stack in any cloud. 

VMware HCX Use Cases

HCX enables you to schedule application migrations of hundreds or thousands of vSphere VMs within your data centers and across public clouds without requiring a reboot or a downtime.

If you would like to change the current platform or have to modernize your current data center, HCX allows you to migrate workloads from vSphere 5.x and non-vSphere (Hyper-V and KVM) environments.

VMware HCX Migration

Workload rebalancing means providing a mobility platform across cloud regions and cloud providers to move applications and workloads at any time for any reason.

Workload mobility is cool and may be the future but is not possible today as the public cloud’s egress costs would be way too high at the moment. Let’s say you pay $0.05 per GB when you move data away from the public cloud to any external destination, this would generate costs of $2.50 for a 50GB virtual machine.

Not that much, right? If you move away 500 VMs, the bill would list $1’250 for egress costs. Evacuating VMs from one public cloud to another one is not so expensive if it happens three or four times a year. But if the rebalancing should happen at a higher cadence, the egress costs would get too high. But we can assume that this fact will change in the future as the public cloud computing prices will come down in the future. 

HCX Components and Services

HCX is available with an Advanced and Enterprise license. The Advanced license services are standard with HCX and are also included in the HCX Enterprise license. The Enterprise license is needed when you migrate non-vSphere workloads into a vSphere environment. This capability is called OS Assisted Migration (OSAM).

HCX Services

The HCX Advanced features are included in a NSX Data Center Enterprise Plus license. With a managed service like VMware Cloud on AWS or Azure VMware Solution HCX Advanced is already be included.

HCX Connector Advanced License

If you want to move workloads from a vSphere environment to a vSphere enabled public cloud, you don’t need the complete VMware Cloud Foundation stack at the source site:

  • On-premises vSphere version 5.5 and above
  • Minimum of 100 Mbps bandwidth between source and destination
  • Virtual switch based on vDS (vSphere Distributed Switch), Cisco Nexus 1000v or vSphere Standard Switch
  • Minimum of virtual machine hardware version 9
  • VMs with hard disks not larger than 2TB

Depending on the HCX license and services you need, you have to deploy some or all of the HCX components. HCX comprises components and appliances at the source and destination sites.

HCX Manager Destination

The HCX Connector services and appliances are deployed at the destination site first before you are going to deploy the virtual appliances at the source site (HCX Interconnect appliance).

HCX Interconnect Appliance Download Link

After you deployed the appliances at the source site, you can create the site pairing.

HCX Site Pairing

As soon as you have installed HCX in both sites, you can manage and configure the services within the vSphere Client.

HCX in vSphere Client

After a successful site pairing, you can start to create the HCX Service Mesh.

The Multi-Site Service mesh is used to create a secure optimized transport fabric between any two sites managed by HCX. When HCX Migration, Disaster recovery, Network Extension, and WAN Optimization services are enabled, HCX deploys Virtual Appliances in the source site and corresponding “peer” virtual appliances on the destination site. The Multi-Site Service Mesh enables the configuration, deployment, and serviceability of these Interconnect virtual appliance pairs.

HCX Service Mesh

In the HCX site-to-site architecture, there is notion of an HCX source and HCX destination environment. Depending on the environment, there is a specific HCX installer:

HCX Connector (previously HCX Enterprise) or HCX Cloud. HCX Connector is always deployed as the source. HCX Cloud is typically deployed as the destination, but it can be used as the source in cloud-to-cloud deployments. In HCX-enabled public clouds, the cloud provider deploys HCX Cloud. The public cloud tenant deploys HCX Connector on-premises.
The source and destination sites are paired together for HCX operations. 

In both the source and destination environments, HCX is deployed to the management zone, next to each site’s vCenter Server, which provides a single plane (HCX Manager) for administering VMware HCX. This HCX Manager provides a framework for deploying HCX service virtual machines across both the source and destination sites. VMware HCX administrators are authenticated, and each task authorized through the existing vSphere SSO identity sources. VMware HCX mobility, extension, protection actions can be initiated from the HCX User Interface or from within the vCenter Server Navigator screen’s context menus.

In the NSX Data Center Enterprise Plus (HCX for Private to Private deployments), the tenant deploys both source and destination HCX Managers.

The HCX-IX service appliance provides replication and vMotion-based migration capabilities over the Internet and private lines to the destination site whereas providing strong encryption, traffic engineering, and virtual machine mobility.

The VMware HCX WAN Optimization service appliance improves performance characteristics of the private lines or Internet paths by applying WAN optimization techniques like the data de-duplication and line conditioning. It makes performance closer to a LAN environment. It accelerates on-boarding to the destination site using Internet/VPN- without waiting for Direct Connect/MPLS circuits.

The VMware HCX Network Extension service appliance provides a late Performance (4–6 Gbps) Layer 2 extension capability. The extension service permits keeping the same IP and MAC addresses during a Virtual Machine migration. Network Extension with Proximity Routing provides the optimal ingress and egress connectivity for virtual machines at the destination site.

 

Using VMware HCX OS Assisted Migration (OSAM), you can migrate guest (non-vSphere) virtual machines from on-premise data centers to the cloud. The OSAM service has several components: the HCX Sentinel software that is installed on each virtual machine to be migrated, a Sentinel Gateway (SGW) appliance for connecting and forwarding guest workloads in the source environment, and a Sentinel Data Receiver (SDR) in the destination environment.

The HCX Sentinel Data Receiver (SDR) appliance works with the HCX Sentinel Gateway appliance to receive, manage, and monitor data replication operations at the destination environment.

HCX Migration Types

VMs can be moved from one HCX-enabled data center using different migration technologies or types.

HCX Migration Types

HCX cold migration uses the VMware NFC (Network File Copy) protocol and is automatically selected when the source VM is powered off.

HCX vMotion

  • This option is designed for moving a single virtual machine at a time
  • There is no service interruption during the HCX vMotion migration
  • Encrypted vMotion between legacy source and SDDC target
  • Bi-directional (Cross-CPU family compatibility without cluster EVC)
  • In-flight Optimization (deduplication/compression)
  • Compatible from vSphere 5.5+ environments (VM HW v9)

HCX Bulk Migration

  • Bulk migration uses the host-based replication (HBR) to move a virtual machine between HCX data centers
  • This option is designed for moving virtual machines in parallel (migration in waves)
  • This migration type can set to complete on a predefined schedule
  • The virtual machine runs at the source site until the failover begins. The service interruption with the bulk migration is equivalent to a reboot
  • Encrypted Replication migration between legacy source and SDDC target
  • Bi-directional (Cross-CPU family compatibility)
  • WAN Optimized (deduplication/compression)
  • VMware Tools and VM Hardware can be upgraded to the latest at the target.

HCX Replication Assisted vMotion

VMware HCX Replication Assisted vMotion (RAV) combines advantages from VMware HCX Bulk Migration (parallel operations, resiliency, and scheduling) with VMware HCX vMotion (zero downtime virtual machine state migration).

HCX OS Assisted Migration

This migration method provides for the bulk migration of guest (non-vSphere) virtual machines using OS Assisted Migration to VMware vSphere on-premise or cloud-based data centers. Enabling this service requires additional HCX licensing.

 

  • Utilizes OS assisted replication to migrate (conceptually similar to vSphere replication)
  • Source VM remains online during replication
  • Quiesce the source VM for final sync before migration
  • Source VM is powered off and the migrated VM is powered on in target site, for low downtime switchover
  • VMware tools is installed on the migrated VM

Cross-Cloud Mobility

Most customers will probably start with one public cloud first, e.g. VMC on AWS, to evaluate the hybridity and mobility HCX delivers. Cross-cloud monility is maybe not a requirement today or tomorrow but gets more important when your company has a multi-cloud strategy which becomes reality very soon.

If you want to be able to move workloads seamlessly between clouds, extend networks and protect workloads the same way across any cloud, then you should consider a VMware platform and use HCX.

HCX Cross-Cloud Mobility

Let’s take network and security as an example. How would you configure and manage all the different network, security, firewall policies etc. in your different clouds with the different infrastructure and security management consoles?

If you abstract the hyperscaler’s global infrastructure and put VMware on top, you could in this case use NSX (software-defined networking) everywhere. And because all the different policies are tied to a virtual machine, it doesn’t matter anymore if you migrate a VM from host to host or from cloud to cloud.

This is what you would call consistent operations and management which is enabled by a consistent infrastructure (across any cloud).

And how would you migrate workloads in a very cost and time efficient way without a layer 2 stretch? You would have to take care of re-IPing workloads and this involves a lot of changes and dependencies. If you have hundreds of applications then the cloud migration would be a never ending project with costs you could not justify.

In the case you need to move workload back to your own on-premises data center, HCX also gives you this advantage.

You have the choice in which cloud you want to run your applications, at any time.

 

HCX and vSphere 7

At the time of writing HCX has no official support for vSphere 7.0 yet. I tested it in my home lab and ran into an error while creating the Service Mesh. At least one other colleague had the same issue with vSphere 7 using NSX-T 3.0 and VDS 7.0.

HCX vSphere 7 Error

I would like to thank Danny Stettler for reviewing and contributing. 🙂 Big kudos to you, Danny! 🙂

I hope the article has helped you to get an overview what HCX and a hybrid cloud model really mean. Drop a comment and share your view and experience when it comes to cloud strategies and migrations.