Microsoft Graph’s journey to HTTP/2

Feb 19, 2024

For years Microsoft Graph has been serving billions of requests a day as one of the main REST APIs of Microsoft services. The service has become key infrastructure for Microsoft’s own applications, line of business solutions and third-party commercial solutions.

Any critical piece of infrastructure comes along with rigorous responsibilities: maintain, support, and secure. And sometimes this infrastructure needs to be “upgraded” to the latest standards which improve the value provided to its users through better safety, better interoperability, and better performance.

On the security front, the service team has deployed support for TLS1.2/1.3, as well as disabled support for TLS1.0/1.1. On the interoperability front, IPv6 has been enabled for Microsoft Graph a few months prior. Both upgrades also helped lower latency.

Likewise, HTTP/2 (h2) helps lower latency, bandwidth usage (binary protocol, headers compression…) and enables better parallelism of multiple requests when compared with HTTP/1.1.

This is the story of how h2 was finally enabled for Microsoft Graph, a story of careful planning and collaboration.

Inception

Back when I used to be a Microsoft MVP, I was already advocating to the then smaller product team to get h2 support. In fact, I believe that being vocal about this request is one of the reasons I’m on the team now, in a “stop complaining about this and come help us with it instead”. Of course, h2 always had a lower priority when compared with shipping additional features.

Fast forward to February 2022, I’m now working in the developer experience team (SDKs, tooling, etc.) and the service team is busy upgrading Microsoft Graph from dotnet “classic” to core. During the deployment of this new version, customers started to complain about older versions of the Java SDK breaking. This seemed odd as internal upgrades should not have any impact on the API surface itself.

It turns out that:

  • h2 is enabled by default with asp.net core. (disabled in classic)
  • h2 normalizes headers to lower-kebab-case. (as opposed to Upper-Kebab-Case in HTTP/1.1)
  • these older Java SDK versions were using case sensitive comparisons to drive their logic.

Planning

After that initial experience with h2, the service team setup a validation platform with h2 enabled so we could test all SDKs’ readiness for h2 and patch the problematic SDKs.

This was even more pre-occupying due to the fact we didn’t have a way to properly measure the issue: the requests/responses were not failing, if any failure would occur, it’d be on the customers’ own infrastructure.

The service team then communicated an upcoming protocol change and give customers over a year to prepare for this change. Customers could either:

  • Disable h2 via code or configuration. (less desired)
  • Check they were using “tested for h2” versions of the SDKs.
  • Check their code where it’s reading headers.

Shipping

We initially had planned to start deploying h2 in September 2023, but had to delay to February 2024 due to the Rapid Reset vulnerability discovered in some h2 server implementations.

After gradual deployment of h2, the service team noticed some service instances hot while some would sit idle. This was caused by partition affinity at the load balancer level not accounting for the max parallel streams per connection which was promptly fixed.

H2 is finally fully deployed and the only thing you should have noticed is that “things feel a bit faster”. These protocol upgrades (h2, IPv6, TLS1.2+) keep our infrastructure working like clockwork.

We’re already looking at deploying HTTP3 which should yield even lower latency through its new QUIC layer!

I hope this kind of behind the curtails article was interesting to you, let me know what you thought of it in the comments!


Last edited Apr 15, 2024 by Vincent Biret


Tags: