This post was co-authored by Mark Russinovich, CTO and Technical Fellow, Azure, and Bryan Kelly, Partner Architect, Azure Hardware Systems and Infrastructure.
When it comes to building the Microsoft Cloud, our work to standardize designs for systems, boards, racks, and other parts of our datacenter infrastructure is paramount to facilitating forward progress and innovation across the computing industry. Microsoft has made a number of contributions to and collaborated with various members of the Open Compute Project (OCP) community, the leading industry group dedicated to open source hardware innovation. This year, we are excited to showcase some of our newest projects at the OCP Global Summit and share our learnings on the path of building a more reliable, trusted, and sustainable cloud. One of the key areas where we’ve seen continued focus and opportunity is driving industrywide standards around platform security. To dive deeper into our contributions in this area, I’ve invited Mark Russinovich, CTO and Technical Fellow, Azure, and Bryan Kelly, Partner Architect, Azure Hardware Systems and Infrastructure, to share more about Microsoft’s newest security contributions to OCP that standardize the foundations of trust, integrity, and reliability in computing.
Securing customer workloads from the cloud to the edge
Microsoft Azure is a leader in cloud security and privacy offering a broad range of confidential computing services to help organizations run workloads that keep business and customer data private with advanced levels of security. As the demand for confidential computing grows from cloud to edge, so do the requirements for consistency and transparency of the security mechanisms that protect workloads. With the rise of edge computing, the resultant growth in the exposed attack surface also presents a need for stronger physical security solutions. In this context, there is an increased need for greater transparency in the infrastructure that underpins these technologies and upholds hardware security promises.
Caliptra: Integrating trust into every chip
At the Open Compute Project (OCP) Summit, we are jointly announcing Caliptra, an open source root of trust (RoT) that produces cryptographic proofs about the hardware protections in place for confidential workloads. Designed with security experts and industry leaders in confidential computing across AMD, Google, Microsoft, and NVIDIA, Caliptra is a forward-looking approach casting transparency into hardware security. As a reusable open source, silicon-level block for integration into systems on a chip (SoCs)—such as CPUs, GPUs, and accelerators—Caliptra provides trustworthy and easily verifiable attestation.
At its core, Caliptra provides foundational security properties that underpin the integrity of higher-level security protection for confidential workloads. The Caliptra RoT has the following essential security properties:
-
Identity: A unique device manufacturer’s cryptographic identity for attestation endorsement. The identity is consistent with TCG DICE and includes intrinsic attestation of the Caliptra firmware.
-
Compartmentalization: Hardware protection barriers that isolate Caliptra’s security assets.
-
Measurement: Cryptographic digests that represent the SoC security configuration in a concise, cryptographically verifiable manner.
The initial Caliptra 0.5 contribution release to OCP contains a series of specifications describing architecture, integration, and implementation. An open sourced register-transfer level (RTL) code implementation of Caliptra that can be synthesized into current SoC designs will be made available, along with the cloud-designed firmware written entirely in Rust. With this trusted foundation designed for confidential cloud devices, Caliptra supports the consistent scaling of confidential workloads across distributed systems.
With deep ecosystem collaboration at the heart of Microsoft’s open source philosophy, we look forward to continuing working closely with our partners and engaging the industry to advance Caliptra. Caliptra RTL and firmware project collaboration will be done under the auspices of the CHIPS Alliance.
Hydra: A new secure Baseboard Management Controller (BMC)
We are also introducing Hydra, a new secure BMC in partnership with Nuvoton. A BMC is typically designed into every server system and expansion chassis—for example, JBOD or GPU. As a diagnostic and recovery controller, the BMC has special privileged hardware interfaces for acquiring debug data and telemetry from CPUs. These interfaces present security concerns, as they are targets for attacks that bypass conventional security defenses.
Azure uses Cerberus, a contribution we made to OCP in 2017 for hardware security, to improve BMC security by enforcing firmware integrity and preventing the persistence of malware in the BMC. However, as threat models evolve to restrict admins with physical access to hardware, the BMC needs security properties to establish secure links to an external RoT.
Microsoft collaborated with Nuvoton to design a new security-focused BMC, with enhanced hardware security throughout the BMC SoC. The silicon-integrated root of trust supports TCG DICE identity flows with hardware engines for fast cryptographic operations and hardware-managed keys. The RoT has a one-way bridge for activity monitoring and controlling the BMC security configuration, including which internal security peripherals the BMC can assess. This unique feature allows fine-grained BMC interface authorization, enabling scenarios whereby temporary access to a debug interface can be granted to the BMC only after it attests its trustworthiness.
Kirkland: A secure Trusted Platform Module (TPM)
While Microsoft provides multilayered security across our datacenters, infrastructure, and operations, we believe in defense-in-depth and that all interconnects should be cryptographically secured from interposer-based attack vectors. In partnership with Google, Infineon, and Intel, we are announcing Project Kirkland at OCP. Project Kirkland demonstrates how, using firmware-only updates to the TPM stack and CPU RoT, the interconnect between the TPM and CPU can be secured in a way that prevents substitution attacks, interposing, and eavesdropping. We are open sourcing this methodology and plan to work with the Trusted Computing Group on standardizing this approach while working with other TPM manufacturers to adopt the same methodology, so these techniques become available to all.
A discrete TPM is a chip typically used to protect secrets for the software running on the CPU and conditionally released based on the CPU’s boot measurements. Historically, the bus between the CPU and the TPM is susceptible to attack from physical adversaries wishing to falsify attested measurements or obtain TPM-bound secrets. The standards-based firmware techniques used in Project Kirkland defend against such attacks by using cryptography to authenticate the caller and protect the transmission of secrets over the bus.
Open hardware innovation at cloud scale
A community-driven approach to infrastructure innovation is vital—not just for continued advancements in trust, efficiency, and scalability, but in service of a larger vision of empowering the ecosystem towards building the for computing needs of tomorrow.
We are also contributing several new hardware designs such as a new modular chassis (Mt. Shasta), a converged architecture that brings form factor, power, and management interface into a modular design—optimized for advanced workloads like high-performance computing, artificial intelligence, and video codecs. In partnership with Quanta and Molex, Mt. Shasta is designed to be fully compatible with Open Rack V3, with flexibility in changing module-module connectivity. Earlier this year, we also collaborated with Intel and contributed the Scalable I/O Virtualization (SIOV) specification to OCP. SIOV enables device and platform manufacturers to an industry standard for hyperscale virtualization of PCI Express and Compute Express Link devices in cloud servers, enabling more scalable, efficient, and cost-effective hardware designs for datacenters.
As the demand for cloud-scale computing and digital services continues to grow, Microsoft is committing to deep ecosystem collaboration with OCP and industry partners to deliver the systems and infrastructure that maximize performance, trust, and resiliency for cloud customers.