From Monolithic Data Centers to Modern Private Clouds

From Monolithic Data Centers to Modern Private Clouds

Behind every shift from old-school to new-school, there is a bigger story about people, power, and most of all, trust. And nowhere is that clearer than in the move from traditional monolithic data centers to what we now call a modern private cloud infrastructure.

A lot of people still think this evolution is just about better technology, faster hardware, or fancier dashboards. But it is not. If you zoom out, the core driver is not features or functions, it is trust in the executive vision, and the willingness to break from the past.

Monolithic data centers stall innovation

But here is the problem: monoliths do not scale in a modern world (or cloud). They slow down innovation, force one-size-fits-all models, and lock organizations into inflexible architectures. And as organizations grew, the burden of managing these environments became more political than practical.

The tipping point was not when better tech appeared. It was when leadership stopped trusting that the monolithic data centers with the monolithic applications could deliver what the business actually needed. That is the key. The failure of monolithic infrastructure was not technical – it was cultural.

Hypervisors are not the platform you think

Let us make that clear: hypervisor are not platforms! They are just silos and one piece of a bigger puzzle.

Yes, they play a role in virtualization. Yes, they helped abstract hardware and brought some flexibility. But let us not overstate it, they do not define modern infrastructure or a private cloud. Hypervisors solve a problem from a decade ago. Modern private infrastructure is not about stacking tools, it is about breaking silos, including the ones created by legacy virtualization models.

Private Cloud – Modern Infrastructure

So, what is a modern private infrastructure? What is a private cloud? It is not just cloud-native behind your firewall. It is not just running Kubernetes on bare metal. It is a mindset.

You do not get to “modern” by chasing features or by replacing one virtualization solution with another vendor. You get there by believing in the principles of openness, automation, decentralization, and speed. And that trust has to start from the top. If your CIO or CTO is still building for audit trails and risk reduction as their north star, you will end up with another monolithic data center stack. Just with fancier logos.

But if leadership leans into trust – trust in people, in automation, in feedback loops – you get a system that evolves. Call it modern. Call it next-gen.

It was never about the technology

We moved from monolithic data centers not because the tech got better (though it did), but because people stopped trusting the old system to serve the new mission.

And as we move forward, we should remember: it is not hypervisors or containers or even clouds that shape the future. It is trust in execution, leadership, and direction. That is the real platform everything else stands on. If your architecture still assumes manual control, ticketing systems, and approvals every step of the way, you are not building a modern infrastructure. You are simply replicating bureaucracy in YAML. A modern infra is about building a cloud that does not need micro-management.

Platform Thinking versus Control

A lot of organizations say they want a platform, but what they really want is control. Big difference.

Platform thinking is rooted in enablement. It is about giving teams consistent experiences, reusable services, and the freedom to ship without opening a support ticket every time they need a VM or a namespace.

And platform thinking only works when there is trust as well:

  • Trust in dev teams to deploy responsibly
  • Trust in infrastructure to self-heal and scale
  • Trust in telemetry and observability to show the truth

Trust is a leadership decision. It starts when execs stop treating infrastructure as a cost center and start seeing it as a product. Something that should deliver value, be measured, and evolve.

It is easy to get distracted. A new storage engine, a new control plane, a new AI-driven whatever. Features are tempting because they are measurable. You can point at them in a dashboard or a roadmap.

But features don’t create trust. People do. The most advanced platform in the world is useless if teams do not trust it to be available, understandable, and usable. 

So instead of asking “what tech should we buy?”, the real question is:

“Do we trust ourselves enough to let go of the old way?”

Because that is what building a modern private cloud is really about.

Trust at Scale

In Switzerland, we like things to work. Predictably. Reliably. On time. With the current geopolitical situation in the world, and especially when it comes to public institutions, that expectation is non-negotiable.

The systems behind those services are under more pressure than ever. Demands are rising and talent is shifting. Legacy infrastructure is getting more fragile and expensive. And at the same time, there is this quiet but urgent question being asked in every boardroom and IT strategy meeting:

Can we keep up without giving up control?

Public sector organizations (not only in Switzerland) face a unique set of constraints:

  • Critical infrastructure cannot go down, ever
  • Compliance and data protection are not just guidelines, they are legal obligations
  • Internal IT often has to serve a wide range of users, platforms, and expectations

So, it is no surprise that many of these organizations default to monolithic, traditional data centers. The logic is understandable: “If we can touch it, we can control it.”

But here is the reality though: control does not scale. And legacy does not adapt. Staying “safe” with old infrastructure might feel responsible, but it actually increases long-term risk, cost, and technical debt. There is a temptation to approach modernization as a procurement problem: pick a new vendor, install a new platform, run a few migrations, and check the box. Done.

But transformation doesn’t work that way. You can’t buy your way out of a culture that does not trust change.

In understand, this can feel uncomfortable. Many institutions are structured to avoid mistakes. But modern IT success requires a shift from control to resilience, and it is not about perfection. It is only perfect until you need to adapt again.

How to start?

By now, it is clear: modern private cloud infrastructure is not about chasing trends or blindly “moving to the cloud.” It’s about designing systems that reflect what your organization values: reliability, control, and trust, while giving teams the tools to evolve. But that still leaves the hardest question of all:

Where do we start?

First, ransparency is the first ingredient of trust. You can’t fix what you won’t name.

Second, modernizing safely does not mean boiling the ocean. It means starting with a thin slice of the future.

The goal is to identify a use case where you can:

  • Show real impact in under six months

  • Reduce friction for both IT and internal users

  • Create confidence that change is possible without risk

In short, it is about finding use cases with high impact but low risk.

Third, this is where a lot of transformation efforts stall. Organizations try to modernize the tech, but keep the old permission structures. The result? A shinier version of the same bottlenecks. Instead, shift from control to guardrails. Think less about who can approve what, and more about how the system enforces good behavior by default. For example:

  • Implement policy-as-code: rules embedded into the platform, not buried in documents

  • Automate security scans, RBAC, and drift detection

  • Give teams safe, constrained freedom instead of needing to ask for access

Guardrails enable trust without giving up safety. That’s the core of a modern infrastructure (private or public cloud).

And lastly, make trust measurable. Not just with uptime numbers or dashboards but with real signals:

  • Are teams delivering faster?

  • Are incidents down?

  • etc.

Make this measurable, visible, and repeatable. Success builds trust. Trust creates momentum.

Final Thoughts

IT organizations do not need moonshots. They need measured, meaningful modernization. The kind that builds belief internally, earns trust externally, and makes infrastructure feel like an asset again.

The technology matters, but how you introduce it matters even more. 

Private Cloud Autarky – You Are Safe Until The World Moves On

Private Cloud Autarky – You Are Safe Until The World Moves On

I believe it was 2023 when the term “autarky” was mentioned during my conversations with several customers, who maintained their own data centers and private clouds. Interestingly, this word popped up again recently at work, but I only knew it from photovoltaic systems. And it kept my mind busy for several weeks.

What is autarky?

To understand autarky in the IT world and its implications for private clouds, an analogy from the photovoltaic (solar power) system world offers a clear parallel. Just as autarky in IT means a private cloud that is fully self-sufficient, autarky in photovoltaics refers to an “off-grid” solar setup that powers a home or facility without relying on the external electrical grid or outside suppliers.

Imagine a homeowner aiming for total energy independence – an autarkic photovoltaic system. Here is what it looks like:

  • Solar Panels: The homeowner installs panels to capture sunlight and generate electricity.
  • Battery: Excess power is stored in batteries (e.g., lithium-ion) for use at night or on cloudy days.
  • Inverter: A device converts solar DC power to usable AC power for appliances.
  • Self-Maintenance: The homeowner repairs panels, replaces batteries, and manages the system without calling a utility company or buying parts. 

This setup cuts ties with the power grid – no monthly bills, no reliance on power plants. It is a self-contained energy ecosystem, much like an autarkic private cloud aims to be a self-contained digital ecosystem.

Question: Which partner (installation company) has enough spare parts and how many homeowners can repair the whole system by themselves?

Let’s align this with autarky in IT:

  • Solar Panels = Servers and Hardware: Just as panels generate power, servers (compute, storage, networking) generate the cloud’s processing capability. Theoretically, an autarkic private cloud requires the organization to build its own servers, similar to crafting custom solar panels instead of buying from any vendor.
  • Battery = Spares and Redundancy: Batteries store energy for later; spare hardware (e.g., extra servers, drives, networking equipment) keeps the cloud running when parts fail. 
  • Inverter = Software Stack: The inverter transforms raw power into usable energy, like how a software stack (OS, hypervisor) turns hardware into a functional cloud.
  • Self-Maintenance = Internal Operations: Fixing a solar system solo parallels maintaining a cloud without vendor support – both need in-house expertise to troubleshoot and repair everything.

Let me repeat it: both need in-house expertise to troubleshoot and repair everything. Everything.

The goal is self-sufficiency and independence. So, what are companies doing?

An autarkic private cloud might stockpile Dell servers or Nvidia GPUs upfront, but that first purchase ties you to external vendors. True autarky would mean mining silicon and forging chips yourself – impractical, just like growing your own silicon crystals for panels.

The problem

In practice, autarky for private clouds sounds like an extreme goal. It promises maximum control. Ideal for scenarios like military secrecy, regulatory isolation, or distrust of global supply chains but clashes with the realities of modern IT:

  • Once the last spare dies, you are done. No new tech without breaking autarky.
  • Autarky trades resilience for stagnation. Your cloud stays alive but grows irrelevant.
  • Autarky’s price tag limits it to tiny, niche clouds – not hyperscale rivals.
  • Future workloads are a guessing game. Stockpile too few servers, and you can’t expand. Too many, and you have wasted millions. A 2027 AI boom or quantum shift could make your equipment useless.

But where is this idea of self-sufficiency or sovereign operations coming from? Nowadays? Geopolitical resilience.

Sanctions or trade wars will not starve your cloud. A private (hyperscale) cloud that answers to no one, free from external risks or influence. That is the whole idea.

What is the probability of such sanctions? Who knows… but this is a number that has to be defined for each case depending on the location/country, internal and external customers, and requirements.

If it happens, is it foreseeable, and what does it force you to do? Does it trigger a cloud-exit scenario?

I just know that if there are sanctions, any hyperscaler in your country has the same problems. No matter if it is a public or dedicated region. That is the blast radius. It is not only about you and your infrastructure anymore.

What about private disconnected hyperscale clouds?

When hosting workloads in the public clouds, organizations care more about data residency, regulations, the US Cloud Act, and less about autarky.

Hyperscale clouds like Microsoft Azure and Oracle Cloud Infrastructure (OCI) are built to deliver massive scale, flexibility, and performance but they rely on complex ecosystems that make full autarky impossible. Oracle offers solutions like OCI Dedicated Region and Oracle Alloy to address sovereignty needs, giving customers more control over their data and operations. However, even these solutions fall short of true autarky and absolute sovereign operations due to practical, technical, and economic realities.

A short explanation from Microsoft gives us a hint why that is the case:

Additionally, some operational sovereignty requirements, like Autarky (for example, being able to run independently of external networks and systems) are infeasible in hyperscale cloud-computing platforms like Azure, which rely on regular platform updates to keep systems in an optimal state.

So, what are customers asking for when they are interested in hosting their own dedicated cloud region in their data centers? Disconnected hyperscale clouds.

But hosting an OCI Dedicated Region in your data center does not change the underlying architecture of Oracle Cloud Infrastructure (OCI). Nor does it change the upgrade or patching process, or the whole operating model.

Hyperscale clouds do not exist in a vacuum. They lean on a web of external and internal dependencies to work:

  • Hardware Suppliers. For example, most public clouds use Nvidia’s GPUs for AI workloads. Without these vendors, hyperscalers could not keep up with the demand.
  • Global Internet Infrastructure. Hyperscalers need massive bandwidth to connect users worldwide. They rely on telecom giants and undersea cables for internet backbone, plus partnerships with content delivery networks (CDNs) like Akamai to speed things up.
  • Software Ecosystems. Open-source tools like Linux and Kubernetes are part of the backbone of hyperscale operations.
  • Operations. Think about telemetry data and external health monitoring.

Innovation depends on ecosystems

The tech world moves fast. Open-source software and industry standards let hyperscalers innovate without reinventing the wheel. OCI’s adoption of Linux or Azure’s use of Kubernetes shows they thrive by tapping into shared knowledge, not isolating themselves. Going it alone would skyrocket costs. Designing custom chips, giving away or sharing operational control or skipping partnerships would drain billions – money better spent on new features, services or lower prices.

Hyperscale clouds are global by nature, this includes Oracle Dedicated Region and Alloy. In return you get:

  • Innovation
  • Scalability
  • Cybersecurity
  • Agility
  • Reliability
  • Integration and Partnerships

Again, by nature and design, hyperscale clouds – even those hosted in your data center as private Clouds (OCI Dedicated Region and Alloy) – are still tied to a hyperscaler’s software repositories, third-party hardware, operations personnel, and global infrastructure.

Sovereignty is real, autarky is a dream

Autarky sounds appealing: a hyperscale cloud that answers to no one, free from external risks or influence. Imagine OCI Dedicated Region or Oracle Alloy as self-contained kingdoms, untouchable by global chaos.

Autarky sacrifices expertise for control, and the result would be a weaker, slower and probably less secure cloud. Self-sufficiency is not cheap. Hyperscalers spend billions of dollars yearly on infrastructure, leaning on economies of scale and vendor deals. Tech moves at lightning speed. New GPUs drop yearly, software patches roll out daily (think about 1’000 updates/patches a month). Autarky means falling behind. It would turn your hyperscale cloud into a relic.

Please note, there are other solutions like air-gapped isolated cloud regions, but those are for a specific industry and set of customers.

From Cloud-First to Cloud-Smart to Repatriation

From Cloud-First to Cloud-Smart to Repatriation

VMware Explore 2024 happened this week in Las Vegas. I think many people were curious about what Hock Tan, CEO of Broadcom, had to say during the general session. He delivered interesting statements and let everyone in the audience know that “the future of enterprise is private – private cloud, private AI, fueled by your own private data“. On social media, the following slide about “repatriation” made quite some noise:

VMware Explore 2024 Keynote Repatriation

The information on this slide came from Barcley’s CIO Survey in April 2024 and it says that 8 out of 10 CIOs today are planning to move workloads from the public cloud back to their on-premises data centers. It is interesting, and in some cases even funny, that other vendors in the hardware and virtualization business are chasing this ambulance now. Cloud migrations are dead, let us do reverse cloud migrations now. Hybrid cloud is dead, let us do hybrid multi-clouds now and provide workload mobility. My social media walls are full of such postings now. It seems Hock Tan presented the Holy Grail to the world.

Where is this change of mind from? Why did only 43% during COVID-19 plan a reverse cloud migration and now “suddenly” more than 80%?

I could tell you the story now about cloud-first not being cool anymore, that organizations started to follow a smarter cloud approach, and then concluded that cloud migrations are still not happening based on their expectations (e.g., costs and complexity). And that it is time now to bring workloads back on-premises. It is not that simple.

I looked at Barclay’s CIO survey and the chart (figure 20 in the survey) that served as a source for Hock Tan’s slide:

Barclays CIO Survey April 2024 Cloud RepatriationWe must be very careful with our interpretation of the results. Just because someone is “planning” a reverse cloud migration, does it mean they are executing? And if they execute such an exercise, is this going to be correctly reflected in a future survey?

And which are the workloads and services that are brought back to an enterprise’s data center? Are we talking about complete applications? Or is it more about load balancers, security appliances, databases and storage, and specific virtual machines? And if we understand the workloads, what are the real reasons to bring them back? Figure 22 of the survey shows “Workloads that Respondents Intend to Move Back to Private Cloud / On-Premise from Public Cloud”:

Barclays CIO Survey April 2024 Workload to migrate

Okay, we have a little bit more context now. Just because some workloads are potentially migrated back to private clouds, what does it mean for public cloud vs. private cloud spend? Question #11 of the survey “What percentage of your workloads and what percentage of your total IT spend are going towards the public cloud, and how have those evolved over time?” focuses on this matter.

Barclays CIO Survey April 2024 Percentage of Workloads and Spend My interpretation? Just because one slide or illustration talks about repatriation does not mean, that the entire world is just doing reverse migrations now. Cloud migrations and reverse cloud migrations can happen at the same time. You could bring one application or some databases back on-premises but decide to move all your virtual desktops to the public cloud in parallel. We could still bring workloads back to our data center and increase public cloud spend. 

Sounds like cloud-smart again, doesn’t it? Maybe I am an organization that realized that the applications A, B, C, and D shouldn’t run in Azure, AWS, Google, and Oracle anymore, but the applications W, X, Y, and Z are better suited for these hyperscalers.

What else?

I am writing about my views and my opinions here. There is more to share. During the pandemic, everything had to happen very quickly, and everyone suddenly had money to speed up migrations and application modernization projects. After that, I think it is a natural thing that everything was slowing down a bit after this difficult and exhausting phase.

Some of the IT teams are probably still documenting all their changes and new deployments on an internal wiki, and their bosses started to hire FinOps specialists to analyze their cloud spend. It is no shocking surprise to me that some of the financial goals haven’t been met and result in a reverse cloud migration a few years later.

But that is not all. Try to think about the past years. What else happened?

Yes, we almost forgot about Artificial Intelligence (AI) and Sovereign Clouds.

Before 2020, not many of us were thinking about sovereign clouds, data privacy, and AI.

Most enterprises are still hosting their data on-premises behind their own firewall. And some of this data is used to train or finetune models. We see (internal) chatbots popping up using Retrieval Augmented Generation (RAG), which delivers answers based on actual data and proprietary information.

Okay. What else? 

Yep, there is more. There are new technologies and offerings available that were not here before. We just covered AI and ML (machine learning) workloads that became a potential cost or compliance concern.

The concept of sovereign clouds has gained traction due to increasing concerns about data sovereignty and compliance with local regulations.

The adoption of hybrid and hybrid multi-cloud strategies has been a significant trend from 2020 to 2024. Think about VMware’s Cloud Foundation approach with Azure, Google, Oracle etc., AWS Outposts, Azure Stack, Oracle’s DRCC, or Nutanix’s.

Enterprises started to upskill and train their people to deliver their own Kubernetes platforms.

Edge computing has emerged as a crucial technology, particularly for industries like manufacturing, telecommunications, and healthcare, where real-time data processing is critical.

Conclusion

Reverse cloud migrations are happening for many different reasons like cost management, performance optimization, data security and compliance, automation and operations, or because of lock-in concerns.

Yes, (cloud) repatriation became prominent, but I think this is just a reflection of the maturing cloud market – and not an ambulance.

And no, it is not a better moment to position your hybrid multi-cloud solutions, unless you understand the services and workloads that need to be migrated from one cloud to another. Just because some CIOs plan to bring back some workloads on-premises, does it mean/imply that they will do it? What about the sunk cost fallacy?

Perhaps IT leaders are going to be more careful in the future and are trying to find other ways for potential cost savings and strategic benefits to achieve their business outcomes – and keep their workloads in the cloud versus repatriating them.

Businesses are adopting a more nuanced workload-centric strategy.

What’s your opinion?

Distributed Hybrid Infrastructure Offerings Are The New Multi-Cloud

Distributed Hybrid Infrastructure Offerings Are The New Multi-Cloud

Since VMware belongs to Broadcom, there was less focus and messaging on multi-cloud or supercloud architectures. Broadcom has drastically changed the available offerings and VMware Cloud Foundation is becoming the new vSphere. Additionally, we have seen big changes regarding the partnerships with hyperscalers (the Azures and AWSes of this world) and the VMware Cloud partners and providers. So, what happened to multi-cloud and how come that nobody (at Broadcom) talks about it anymore?

What is going on?

I do not know if it’s only me, but I do not see the term “multi-cloud” that often anymore. Do you? My LinkedIn feed is full of news about artificial intelligence (AI) and how Nvidia employees got rich. So, I have to admit that I lost track of hybrid clouds, multi-clouds, or hybrid multi-cloud architectures. 

Cloud-Inspired and Cloud-Native Private Clouds

It seems to me that the initial idea of multi-cloud has changed in the meantime and that private clouds are becoming platforms with features. Let me explain.

Organizations have built monolithic private clouds in their data centers for a long time. In software engineering, the word “monolithic” describes an application that consists of multiple components, which form something larger. To build data centers, we followed the same approach by using different components like compute, storage, and networking. And over time, IT teams started to think about automation and security, and the integration of different solutions from different vendors.

The VMware messaging was always pointing in the right direction: They want to provide a cloud operating system for any hardware and any cloud (by using VMware Cloud Foundation). On top of that, build abstraction layers and leverage a unified control plane (aka consistent automation and operations).

And I told all my customers since 2020 that they need to think like a cloud service provider, get rid of silos, implement new processes, and define a new operating model. That is VMware by Broadcom’s messaging today and this is where they and other vendors are headed: a platform with features that provide cloud services.

In other words, and this is my opinion, VMware Cloud Foundation is today a platform with different components like vSphere, vSAN, NSX, Aria, and so on. Tomorrow, it is still called VMware Cloud Foundation, a platform that includes compute, storage, networking, automation, operations, and other features. No more other product names, just capabilities, and services like IaaS, CaaS, DRaaS or DBaaS. You just choose the specs of the underlying hardware and networking, deploy your private clouds, and then start to build and consume your services.

Replace the name “VMware Cloud Foundation” in the last paragraph with AWS Outposts or Azure Stack. Do you see it now? Distributed unmanaged and managed hybrid cloud offerings with a (service) consumption interface on top.

That is the shift from monolithic data centers to cloud-native private clouds.

From Intercloud to Multi-Cloud

It is not the first time that I write about interclouds, that not many of us know. In 2012, there was this idea that different clouds and vendors need to be interoperable and agree on certain standards and protocols. Think about interconnected private and public clouds, which allow you to provide VM mobility or application portability. Can you see the picture in front of you? What is the difference today in 2024?

In 2023, I truly believed that VMware figured it out when they announced VMware Cloud on Equinix Metal (VMC-E). To me, VMC-E was different and special because of Equinix, who is capable of interconnecting different clouds, and at the same time could provide a baremetal-as-a-service (BMaaS) offering.

Workload Mobility and Application Portability

Almost 2 years ago, I started to write a book about this topic, because I wanted to figure out if workload mobility and application portability are things, that enterprises are really looking for. I interviewed many CIOs, CTOs, chief architects and engineers around the globe, and it became VERY clear: it seems nobody was changing anything to make app portability a design requirement.

Almost all of the people I have spoken to, told me, that a lot of things must happen that could trigger a cloud-exit and therefore they see this as a nice-to-have capability that helps them to move virtual machines or applications faster from one cloud to another.

VMware Workload Mobility

And I have also been told that a lift & shift approach is not providing any value to almost all of them.

But when I talked to developers and operations teams, the answers changed. Most of them did not know that a vendor could provide mobility or portability. Anyway, what has changed now?

Interconnected Multi-Clouds and Distributed Hybrid Clouds

I mentioned it already before. Some vendors have realized that they need to deliver a unified and integrated programmable platform with a control plane. Ideally, this control plane can be used on-premises, as a SaaS solution, or both. And according to Gartner, these are the leaders in this area (Magic Quadrant for Distributed Hybrid Infrastructure):

Gartner Magic-Quadrant-for-Distributed-Hybrid-Infrastructure

In my opinion, VMware and Nutanix are providing a hybrid multi-cloud approach.

AWS and Microsoft are providing hybrid cloud solutions. In Microsoft’s case, we see Azure Stack HCI, Azure Kubernetes Service (AKS incl. Hybrid AKS) and Azure Arc extending Microsoft’s Azure services to on-premises data centers and edge locations.

The only vendor, that currently offers true multi-cloud capabilities, is Oracle. Oracle has Dedicated Region Cloud@Customer (DRCC) and Roving Edge, but also partnerships with Microsoft and Google that allow customers to host Oracle databases in Azure and Google Cloud data centers. Both partnerships come with a cross-cloud interconnection.

That is one of the big differences and changes for me at the moment. Multi-cloud has become less about mobility or portability, a single global control plane, or the same Kubernetes distribution in all the clouds, but more about bringing different services from different cloud providers closer together.

This is the image I created for the VMC-E blog. Replace the words “AWS” and “Equinix” with “Oracle” and suddenly you have something that was not there before, an interconnected multi-cloud.

What’s Next?

Based on the conversations with my customers, it does not feel that public cloud migrations are happening faster than in 2020 or 2022 and we still see between 70 and 80% of the workloads hosted on-premises. While we see customers who are interested in a cloud-first approach, we see many following a hybrid multi-cloud and/or multi-cloud approach. It is still about putting the right applications in the right cloud based on the right decisions. This has not changed.

But the narrative of such conversations has changed. We will see more conversations about data residency, privacy, security, gravity, proximity, and regulatory requirements. Then there are sovereign clouds.

Lastly, enterprises are going to deploy new platforms for AI-based workloads. But that could still take a while.

Final Thoughts

As enterprises continue to navigate the above mentioned complexities, the need for flexible, scalable, and secure infrastructure solutions will only grow. There are a few compelling solutions that bridge the gap between traditional on-premises systems and modern cloud environments.

And since most enterprises are still hosting their workloads on-premises, they have to decide if they want to stretch the private cloud to the public cloud, or the other way around. Both options can co-exist, but would make it too big and too complex. What’s your conclusion?

AZ-104 Study Guide – Azure Storage

AZ-104 Study Guide – Azure Storage

 

If you are looking for the full AZ-104 study guide: https://www.cloud13.ch/2023/10/31/az-104-study-guide-microsoft-azure-administrator/ 

It is clear to me that networking is probably the most complex topic in Azure. The concept is very different from the on-premises world,  you have so many options and a lot of topics to understand. Let us focus on Azure storage as the next topic. As always, I will follow John Savill’s guidance and look for the documentation online.

Storage Accounts

An Azure storage account contains all of your Azure Storage data objects: blobs, files, queues, and tables. The storage account provides a unique namespace for your Azure Storage data that’s accessible from anywhere in the world over HTTP or HTTPS. Data in your storage account is durable and highly available, secure, and massively scalable.

When naming your storage account, keep these rules in mind:

  • Storage account names must be between 3 and 24 characters in length and may contain numbers and lowercase letters only.
  • Your storage account name must be unique within Azure. No two storage accounts can have the same name.

Azure Storage Redundancy

Data in an Azure Storage account is always replicated three times in the primary region. Azure Storage offers two options for how your data is replicated in the primary region:

  • Locally redundant storage (LRS) copies your data synchronously three times within a single physical location in the primary region. LRS is the least expensive replication option, but isn’t recommended for applications requiring high availability or durability.Diagram showing how data is replicated in a single data center with LRS
    • Zone-redundant storage (ZRS) copies your data synchronously across three Azure availability zones in the primary region. For applications requiring high availability, Microsoft recommends using ZRS in the primary region, and also replicating to a secondary region.

    Diagram showing how data is replicated in the primary region with ZRS

    Redundancy in a secondary region

    For applications requiring high durability, you can choose to additionally copy the data in your storage account to a secondary region that is hundreds of miles away from the primary region. If your storage account is copied to a secondary region, then your data is durable even in the case of a complete regional outage or a disaster in which the primary region isn’t recoverable.

    Azure Storage offers two options for copying your data to a secondary region:

    • Geo-redundant storage (GRS) copies your data synchronously three times within a single physical location in the primary region using LRS. It then copies your data asynchronously to a single physical location in the secondary region. Within the secondary region, your data is copied synchronously three times using LRS.

    Diagram showing how data is replicated with GRS or RA-GRS

    • Geo-zone-redundant storage (GZRS) copies your data synchronously across three Azure availability zones in the primary region using ZRS. It then copies your data asynchronously to a single physical location in the secondary region. Within the secondary region, your data is copied synchronously three times using LRS.

    Diagram showing how data is replicated with GZRS or RA-GZRS

    Geo-redundant storage (with GRS or GZRS) replicates your data to another physical location in the secondary region to protect against regional outages. With an account configured for GRS or GZRS, data in the secondary region is not directly accessible to users or applications, unless a failover occurs. The failover process updates the DNS entry provided by Azure Storage so that the secondary endpoint becomes the new primary endpoint for your storage account. During the failover process, your data is inaccessible. After the failover is complete, you can read and write data to the new primary region.

    The following table describes key parameters for each redundancy option:

    Parameter LRS ZRS GRS/RA-GRS GZRS/RA-GZRS
    Percent durability of objects over a given year at least 99.999999999% (11 9’s) at least 99.9999999999% (12 9’s) at least 99.99999999999999% (16 9’s) at least 99.99999999999999% (16 9’s)
    Availability for read requests At least 99.9% (99% for cool or archive access tiers) At least 99.9% (99% for cool access tier)

    At least 99.9% (99% for cool or archive access tiers) for GRS

    At least 99.99% (99.9% for cool or archive access tiers) for RA-GRS

    At least 99.9% (99% for cool access tier) for GZRS

    At least 99.99% (99.9% for cool access tier) for RA-GZRS

    Availability for write requests At least 99.9% (99% for cool or archive access tiers) At least 99.9% (99% for cool access tier) At least 99.9% (99% for cool or archive access tiers) At least 99.9% (99% for cool access tier)
    Number of copies of data maintained on separate nodes Three copies within a single region Three copies across separate availability zones within a single region Six copies total, including three in the primary region and three in the secondary region Six copies total, including three across separate availability zones in the primary region and three locally redundant copies in the secondary region

    Azure Blobs

    Azure Storage offers three types of blob storage:

    • Block Blobs. Block blobs are composed of blocks and are ideal for storing text or binary files, and for uploading large files efficiently.
    • Append Blobs. Append blobs are also made up of blocks, but they are optimized for append operations, making them ideal for logging scenarios.
    • Page blobs. Page blobs are made up of 512-byte pages up to 8 TB in total size and are designed for frequent random read/write operations. Page blobs are the foundation of Azure IaaS Disks

    Overview of Azure page blobs

    Page blobs are a collection of 512-byte pages, which provide the ability to read/write arbitrary ranges of bytes. Hence, page blobs are ideal for storing index-based and sparse data structures like OS and data disks for Virtual Machines and Databases. For example, Azure SQL DB uses page blobs as the underlying persistent storage for its databases. Moreover, page blobs are also often used for files with Range-Based updates.

    Key features of Azure page blobs are its REST interface, the durability of the underlying storage, and the seamless migration capabilities to Azure. These features are discussed in more detail in the next section. In addition, Azure page blobs are currently supported on two types of storage: Premium Storage and Standard Storage. Premium Storage is designed specifically for workloads requiring consistent high performance and low latency making premium page blobs ideal for high performance storage scenarios. Standard storage accounts are more cost effective for running latency-insensitive workloads.

    Azure page blobs are the backbone of the virtual disks platform for Azure IaaS. Both Azure OS and data disks are implemented as virtual disks where data is durably persisted in the Azure Storage platform and then delivered to the virtual machines for maximum performance. Azure Disks are persisted in Hyper-V VHD format and stored as a page blob in Azure Storage. In addition to using virtual disks for Azure IaaS VMs, page blobs also enable PaaS and DBaaS scenarios such as Azure SQL DB service, which currently uses page blobs for storing SQL data, enabling fast random read-write operations for the database. Another example would be if you have a PaaS service for shared media access for collaborative video editing applications, page blobs enable fast access to random locations in the media. It also enables fast and efficient editing and merging of the same media by multiple users.

    The following visual illustrates the guidelines to choose the various Azure data transfer tools depending upon the network bandwidth available for transfer, data size intended for transfer, and frequency of the transfer.

    Azure data transfer tools

    Premium block blob storage accounts

    Premium block blob storage accounts make data available via high-performance hardware. Data is stored on solid-state drives (SSDs) which are optimized for low latency. SSDs provide higher throughput compared to traditional hard drives. File transfer is much faster because data is stored on instantly accessible memory chips. All parts of a drive accessible at once. By contrast, the performance of a hard disk drive (HDD) depends on the proximity of data to the read/write heads.

    Access tiers for blob data

    Data stored in the cloud grows at an exponential pace. To manage costs for your expanding storage needs, it can be helpful to organize your data based on how frequently it will be accessed and how long it will be retained. Azure storage offers different access tiers so that you can store your blob data in the most cost-effective manner based on how it’s being used. Azure Storage access tiers include:

    • Hot tier – An online tier optimized for storing data that is accessed or modified frequently. The hot tier has the highest storage costs, but the lowest access costs.
    • Cool tier – An online tier optimized for storing data that is infrequently accessed or modified. Data in the cool tier should be stored for a minimum of 30 days. The cool tier has lower storage costs and higher access costs compared to the hot tier.
    • Cold tier – An online tier optimized for storing data that is rarely accessed or modified, but still requires fast retrieval. Data in the cold tier should be stored for a minimum of 90 days. The cold tier has lower storage costs and higher access costs compared to the cool tier.
    • Archive tier – An offline tier optimized for storing data that is rarely accessed, and that has flexible latency requirements, on the order of hours. Data in the archive tier should be stored for a minimum of 180 days.

    Object replication for block blobs

    Object replication asynchronously copies block blobs between a source storage account and a destination account. Some scenarios supported by object replication include:

    • Minimizing latency. Object replication can reduce latency for read requests by enabling clients to consume data from a region that is in closer physical proximity.
    • Increase efficiency for compute workloads. With object replication, compute workloads can process the same sets of block blobs in different regions.
    • Optimizing data distribution. You can process or analyze data in a single location and then replicate just the results to additional regions.
    • Optimizing costs. After your data has been replicated, you can reduce costs by moving it to the archive tier using life cycle management policies.

    Diagram showing how object replication works

    Append Blobs

    An append blob is composed of blocks and is optimized for append operations. When you modify an append blob, blocks are added to the end of the blob only, via the Append Block operation. Updating or deleting of existing blocks is not supported. Unlike a block blob, an append blob does not expose its block IDs.

    Each block in an append blob can be a different size, up to a maximum of 4 MiB, and an append blob can include up to 50,000 blocks. The maximum size of an append blob is therefore slightly more than 195 GiB (4 MiB X 50,000 blocks).

    Azure Files

    Azure Files offers fully managed file shares in the cloud that are accessible via the industry standard Server Message Block (SMB) protocol, Network File System (NFS) protocol, and Azure Files REST API. Azure file shares can be mounted concurrently by cloud or on-premises deployments.

    SMB Azure file shares are accessible from Windows, Linux, and macOS clients. NFS Azure file shares are accessible from Linux clients. Additionally, SMB Azure file shares can be cached on Windows servers with Azure File Sync for fast access near where the data is being used.

    Active Directory as Authentication Source

    On-premises Active Directory Domain Services (AD DS) integration with Azure Files provides the methods for storing directory data while making it available to network users and administrators. Security is integrated with AD DS through logon authentication and access control to objects in the directory. With a single network logon, administrators can manage directory data and organization throughout their network, and authorized network users can access resources anywhere on the network. AD DS is commonly adopted by enterprises in on-premises environments or on cloud-hosted VMs, and AD DS credentials are used for access control. 

    Files AD workflow diagram

    Azure File Sync

    Azure File Sync enables centralizing your organization’s file shares in Azure Files, while keeping the flexibility, performance, and compatibility of a Windows file server. While some users may opt to keep a full copy of their data locally, Azure File Sync additionally can transform Windows Server into a quick cache of your Azure file share. You can use any protocol that’s available on Windows Server to access your data locally, including SMB, NFS, and FTPS. You can have as many caches as you need across the world.

    An Azure hybrid file services topology diagram.

    Azure Queue Storage

    Azure Queue Storage is a service for storing large numbers of messages. You access messages from anywhere in the world via authenticated calls using HTTP or HTTPS. A queue message can be up to 64 KB in size. A queue may contain millions of messages, up to the total capacity limit of a storage account.

    Azure Table Storage

    Azure Table storage stores large amounts of structured data. The service is a NoSQL datastore which accepts authenticated calls from inside and outside the Azure cloud. Azure tables are ideal for storing structured, non-relational data. Common uses of Table storage include:

    • Storing TBs of structured data capable of serving web scale applications
    • Storing datasets that don’t require complex joins, foreign keys, or stored procedures and can be denormalized for fast access
    • Quickly querying data using a clustered index
    • Accessing data using the OData protocol and LINQ queries with WCF Data Service .NET Libraries

    You can use Table storage to store and query huge sets of structured, non-relational data, and your tables will scale as demand increases.

    Tables storage component diagram

     

    Azure Managed Disks

    Azure managed disks are block-level storage volumes that are managed by Azure and used with Azure Virtual Machines. Managed disks are like a physical disk in an on-premises server but, virtualized. With managed disks, all you have to do is specify the disk size, the disk type, and provision the disk. Once you provision the disk, Azure handles the rest.

    The available types of disks are ultra disks, premium solid-state drives (SSD), standard SSDs, and standard hard disk drives (HDD). For information about each individual disk type, see Select a disk type for IaaS VMs.

    Disk type comparison

    The following table provides a comparison of the five disk types to help you decide which to use.

    Ultra disk Premium SSD v2 Premium SSD Standard SSD Standard HDD
    Disk type SSD SSD SSD SSD HDD
    Scenario IO-intensive workloads such as SAP HANA, top tier databases (for example, SQL, Oracle), and other transaction-heavy workloads. Production and performance-sensitive workloads that consistently require low latency and high IOPS and throughput Production and performance sensitive workloads Web servers, lightly used enterprise applications and dev/test Backup, non-critical, infrequent access
    Max disk size 65,536 GiB 65,536 GiB 32,767 GiB 32,767 GiB 32,767 GiB
    Max throughput 4,000 MB/s 1,200 MB/s 900 MB/s 750 MB/s 500 MB/s
    Max IOPS 160,000 80,000 20,000 6,000 2,000, 3,000*
    Usable as OS Disk? No No Yes Yes Yes

    * Only applies to disks with performance plus (preview) enabled.

    Note: You can adjust ultra disk IOPS and throughput performance at runtime without detaching the disk from the virtual machine. After a performance resize operation has been issued on a disk, it can take up to an hour for the change to take effect. Up to four performance resize operations are permitted during a 24-hour window.

    AZ-104 Study Guide – Azure Networking Part 2

    AZ-104 Study Guide – Azure Networking Part 2

     

    If you are looking for the full AZ-104 study guide: https://www.cloud13.ch/2023/10/31/az-104-study-guide-microsoft-azure-administrator/ 

    In part 1, I covered vNets, public IPs, vNet peering, NSGs and Azure Firewall. Part 2 is about DNS services, S2S VPN, Express Route, vWAN, Endpoints, Azure Load Balancers and Azure App Gateway.

    Azure DNS Services

    Azure DNS is a hosting service for DNS domains that provides name resolution by using Microsoft Azure infrastructure. By hosting your domains in Azure, you can manage your DNS records by using the same credentials, APIs, tools, and billing as your other Azure services.

    Azure Private DNS provides a reliable, secure DNS service to manage and resolve domain names in a virtual network without the need to add a custom DNS solution. By using private DNS zones, you can use your own custom domain names rather than the Azure-provided names available today.

    DNS overview

    The records contained in a private DNS zone aren’t resolvable from the Internet. DNS resolution against a private DNS zone works only from virtual networks that are linked to it.

    When you create a private DNS zone, Azure stores the zone data as a global resource. This means that the private zone is not dependent on a single VNet or region. You can link the same private zone to multiple VNets in different regions. If service is interrupted in one VNet, your private zone is still available. 

    The Azure DNS private zones auto registration feature manages DNS records for virtual machines deployed in a virtual network. When you link a virtual network with a private DNS zone with this setting enabled, a DNS record gets created for each virtual machine deployed in the virtual network.

    Screenshot of enable auto registration on add virtual network link page.Note: Auto registration works only for virtual machines. For all other resources like internal load balancers, you can create DNS records manually in the private DNS zone linked to the virtual network.

    You can also configure Azure DNS to resolve host names in your public domain. For example, if you purchased the contoso.xyz domain name from a domain name registrar, you can configure Azure DNS to host the contoso.xyz domain and resolve www.contoso.xyz to the IP address of your web server or web app.

    Diagram of DNS deployment environment using the Azure portal.

    Quickstart guides:

    Site-to-Site VPN

    First of all, to create a site-to-site (S2S) VPN, you need a VPN Gateway.

    A VPN gateway is a type of virtual network gateway. A VPN gateway sends encrypted traffic between your virtual network and your on-premises location across a public connection. You can also use a VPN gateway to send traffic between virtual networks. When you create a VPN gateway, you use the -GatewayType value ‘Vpn’.

    A Site-to-site (S2S) VPN gateway connection is a connection over IPsec/IKE (IKEv1 or IKEv2) VPN tunnel. S2S connections can be used for cross-premises and hybrid configurations. A S2S connection requires a VPN device located on-premises that has a public IP address assigned to it. For information about selecting a VPN device, see the VPN Gateway FAQ – VPN devices.

    Diagram of site-to-site VPN Gateway cross-premises connections.

    VPN Gateway can be configured in active-standby mode using one public IP or in active-active mode using two public IPs. In active-standby mode, one IPsec tunnel is active and the other tunnel is in standby. In this setup, traffic flows through the active tunnel, and if some issue happens with this tunnel, the traffic switches over to the standby tunnel. Setting up VPN Gateway in active-active mode is recommended in which both the IPsec tunnels are simultaneously active, with data flowing through both tunnels at the same time. An additional advantage of active-active mode is that customers experience higher throughputs.

    You can create more than one VPN connection from your virtual network gateway, typically connecting to multiple on-premises sites. When working with multiple connections, you must use a RouteBased VPN type (known as a dynamic gateway when working with classic VNets). Because each virtual network can only have one VPN gateway, all connections through the gateway share the available bandwidth. This type of connection is sometimes referred to as a “multi-site” connection.

    Diagram of site-to-site VPN Gateway cross-premises connections with multiple sites.

    How-to guides:

    Diagram of a site-to-site VPN connection coexisting with an ExpressRoute connection for two different sites.

    ExpressRoute

    ExpressRoute lets you extend your on-premises networks into the Microsoft cloud over a private connection with the help of a connectivity provider. With ExpressRoute, you can establish connections to Microsoft cloud services, such as Microsoft Azure and Microsoft 365.

    Connectivity can be from an any-to-any (IP VPN) network, a point-to-point Ethernet network, or a virtual cross-connection through a connectivity provider at a colocation facility. ExpressRoute connections don’t go over the public Internet. This allows ExpressRoute connections to offer more reliability, faster speeds, consistent latencies, and higher security than typical connections over the Internet. For information on how to connect your network to Microsoft using ExpressRoute, see ExpressRoute connectivity models.

    ExpressRoute connection overview

    Across on-premises connectivity with ExpressRoute Global Reach

    You can enable ExpressRoute Global Reach to exchange data across your on-premises sites by connecting your ExpressRoute circuits. For example, if you have a private data center in California connected to an ExpressRoute circuit in Silicon Valley and another private data center in Texas connected to an ExpressRoute circuit in Dallas.

    With ExpressRoute Global Reach, you can connect your private data centers together through these two ExpressRoute circuits. Your cross data-center traffic will traverse through the Microsoft network.

    Diagram that shows a use case for Express Route Global Reach.

    For more information, see ExpressRoute Global Reach.

    Virtual WAN

    Azure Virtual WAN is a networking service that brings many networking, security, and routing functionalities together to provide a single operational interface. Some of the main features include:

    • Branch connectivity (via connectivity automation from Virtual WAN Partner devices such as SD-WAN or VPN CPE)
    • Site-to-site VPN connectivity
    • Remote user VPN connectivity (point-to-site)
    • Private connectivity (ExpressRoute)
    • Intra-cloud connectivity (transitive connectivity for virtual networks)
    • VPN ExpressRoute inter-connectivity
    • Routing, Azure Firewall, and encryption for private connectivity

    The Virtual WAN architecture is a hub and spoke architecture with scale and performance built in for branches (VPN/SD-WAN devices), users (Azure VPN/OpenVPN/IKEv2 clients), ExpressRoute circuits, and virtual networks. It enables a global transit network architecture, where the cloud hosted network ‘hub’ enables transitive connectivity between endpoints that may be distributed across different types of ‘spokes’.

    Virtual WAN diagram.

    Learning Module: Introduction to Azure Virtual WAN

    Service Endpoints

    Service endpoints provide the following benefits:

    • Improved security for your Azure service resources: VNet private address spaces can overlap. You can’t use overlapping spaces to uniquely identify traffic that originates from your VNet. Service endpoints enable securing of Azure service resources to your virtual network by extending VNet identity to the service. Once you enable service endpoints in your virtual network, you can add a virtual network rule to secure the Azure service resources to your virtual network. The rule addition provides improved security by fully removing public internet access to resources and allowing traffic only from your virtual network.
    • Optimal routing for Azure service traffic from your virtual network: Today, any routes in your virtual network that force internet traffic to your on-premises and/or virtual appliances also force Azure service traffic to take the same route as the internet traffic. Service endpoints provide optimal routing for Azure traffic.Endpoints always take service traffic directly from your virtual network to the service on the Microsoft Azure backbone network. Keeping traffic on the Azure backbone network allows you to continue auditing and monitoring outbound Internet traffic from your virtual networks, through forced-tunneling, without impacting service traffic. For more information about user-defined routes and forced-tunneling, see Azure virtual network traffic routing.
    • Simple to set up with less management overhead: You no longer need reserved, public IP addresses in your virtual networks to secure Azure resources through IP firewall. There are no Network Address Translation (NAT) or gateway devices required to set up the service endpoints. You can configure service endpoints through a single selection on a subnet. There’s no extra overhead to maintaining the endpoints.

    Private Endpoints

    A private endpoint is a network interface that uses a private IP address from your virtual network. This network interface connects you privately and securely to a service that’s powered by Azure Private Link. By enabling a private endpoint, you’re bringing the service into your virtual network.

    The service could be an Azure service such as:

    Azure Load Balancer (Layer 4)

    With Azure Load Balancer, you can scale your applications and create highly available services. Load balancer supports both inbound and outbound scenarios. Load balancer provides low latency and high throughput, and scales up to millions of flows for all TCP and UDP applications.

    Diagram depicts public and internal load balancers directing traffic to web and business tiers.

    Key scenarios that you can accomplish using Azure Standard Load Balancer include:

    Azure App Gateway (Layer 7)

    Azure Application Gateway is a web traffic (OSI layer 7) load balancer that enables you to manage traffic to your web applications. Traditional load balancers operate at the transport layer (OSI layer 4 – TCP and UDP) and route traffic based on source IP address and port, to a destination IP address and port.

    Application Gateway can make routing decisions based on additional attributes of an HTTP request, for example URI path or host headers. For example, you can route traffic based on the incoming URL. So if /images is in the incoming URL, you can route traffic to a specific set of servers (known as a pool) configured for images. If /video is in the URL, that traffic is routed to another pool that’s optimized for videos.

    imageURLroute

    The App Gateway features can be found here: https://learn.microsoft.com/en-us/azure/application-gateway/features

    Here is a service comparison from all the load balancing options:

    Azure Load Balancing Comparison

    Global Load Balancing

    Azure load-balancing services can be categorized along two dimensions: global versus regional and HTTP(S) versus non-HTTP(S).

    Global vs. regional

    • Global: These load-balancing services distribute traffic across regional back-ends, clouds, or hybrid on-premises services. These services route end-user traffic to the closest available back-end. They also react to changes in service reliability or performance to maximize availability and performance. You can think of them as systems that load balance between application stamps, endpoints, or scale-units hosted across different regions/geographies.
    • Regional: These load-balancing services distribute traffic within virtual networks across virtual machines (VMs) or zonal and zone-redundant service endpoints within a region. You can think of them as systems that load balance between VMs, containers, or clusters within a region in a virtual network.

    The following table summarizes the Azure load-balancing services.

    Service Global/Regional Recommended traffic
    Azure Front Door Global HTTP(S)
    Azure Traffic Manager Global Non-HTTP(S)
    Azure Application Gateway Regional HTTP(S)
    Azure Load Balancer Regional or Global Non-HTTP(S)

    Azure load-balancing services

    Here are the main load-balancing services currently available in Azure:

    • Azure Front Door is an application delivery network that provides global load balancing and site acceleration service for web applications. It offers Layer 7 capabilities for your application like SSL offload, path-based routing, fast failover, and caching to improve performance and high availability of your applications. 
    • Traffic Manager is a DNS-based traffic load balancer that enables you to distribute traffic optimally to services across global Azure regions, while providing high availability and responsiveness. Because Traffic Manager is a DNS-based load-balancing service, it load balances only at the domain level. For that reason, it can’t fail over as quickly as Azure Front Door, because of common challenges around DNS caching and systems not honoring DNS TTLs.
    • Application Gateway provides application delivery controller as a service, offering various Layer 7 load-balancing capabilities. Use it to optimize web farm productivity by offloading CPU-intensive SSL termination to the gateway.
    • Load Balancer is a high-performance, ultra-low-latency Layer 4 load-balancing service (inbound and outbound) for all UDP and TCP protocols. It’s built to handle millions of requests per second while ensuring your solution is highly available. Load Balancer is zone redundant, ensuring high availability across availability zones. It supports both a regional deployment topology and a cross-region topology.