Bringing HTTP/2 to GOV.UK

A presentation at London Web Performance Meetup in April 2020 in London, UK by Matt Hobbs

Slide 1

Slide 1

Matt Hobbs Head of Frontend, Lead Developer Government Digital Service @TheRealNooshu Hello everyone! Thanks: Andy, Andrew, and Simon for organising and allowing me to speak. I’m Matt Hobbs…

Slide 2

Slide 2

I work at the This is a text slide Government Digital Service I’m the Head of Frontend at Government Digital Service (GDS).

Slide 3

Slide 3

Bringing HTTP/2 to GOV.UK “Bringing HTTP/2 to GOV.UK”

Slide 4

Slide 4

Who are GDS? GDS: we are a central government department that has created and maintains a number of government services.

Slide 5

Slide 5

GOV.UK Building and maintaining GOV.UK. GOV.UK is the website for the UK government. It’s the best place to find policy, announcements, information about the government, and guidance for citizens. Since 2012 it has replaced 1,884 government websites with just one, to become the home of all central government’s online content and services. And it’s what the rest of this talk is about. All this work was conducted way before the current coronavirus outbreak.

Slide 6

Slide 6

What is HTTP/2? What is HTTP/2? HTTP/2 is the latest stable version of the HTTP protocol. Improvements over HTTP/1.1….

Slide 7

Slide 7

● ● ● ● HPACK header compression Multiplexing streams Prioritisation Server push† †: May or may not be an improvement, but it’s in the specifications GDS Minimise protocol overhead via the compression of headers using HPACK Reduce network latency with the use of request and response multiplexing streams over a single TCP connection Much more control over the prioritisation of assets and the order in which they are downloaded Ability to use server push (e.g. push an asset to the browser without it having to request it) Server push is a controversial topic, and depending on who you speak to, it may or may not offer any perf improvements. It’s a whole talk in itself. I’m including it as it is in the H2 spec.

Slide 8

Slide 8

Why enable it? So where did it all begin? What prompted us to enable it in the first place? Well there are many articles on the web that talk about the performance improvement HTTP/2 can bring to a website, all you need to do is “enable it”. A magic bullet to solve all performance problems maybe…

Slide 9

Slide 9

And if you happen to use Google Lighthouse (v5) for auditing your sites performance….

Slide 10

Slide 10

…you may have seen something similar to this under ‘Best Practices’. 14 passed audits gets you a score of 93%. To add the missing 7% and reach the magic 100% score, just enable HTTP/2.

Slide 11

Slide 11

You can see where the best practice scores come from by looking at the ‘Lighthouse v5 Score Weighting’ spreadsheet. Here’s the missing 7%.

Slide 12

Slide 12

10 page report on HTTP/2 GDS Make a case for enabling HTTP/2 on the GOV.UK, 10 page report on what it is and advantages How we could enable it and roll it out.

Slide 13

Slide 13

On examining all the evidence I cannot see any downsides to enabling this protocol on our Fastly CDN layer. Matt Hobbs - 8th October 2018 GDS Very last sentence in the report I write this: On examining all the evidence I cannot see any downsides to enabling this protocol on our Fastly CDN layer. Ignorance is bliss.

Slide 14

Slide 14

Initial trial Positivity of the report, and everything I’d been reading about it. Contacted Fastly support to enable it Eagerly awaited the results Tools: WebPageTest, SpeedCurve, SiteSpeed.io, and Lighthouse.

Slide 15

Slide 15

● ● 5 page types selected, different content / templates Tested on: ○ Chrome Desktop - Native (Sitespeed.io) ○ Chrome Mobile - 3G & 3G slow (Sitespeed.io) ○ Firefox Desktop - Native (Sitespeed.io) ○ Firefox Mobile - 3G & 3G slow (Sitespeed.io) ○ Nexus 5 - Chrome - 3G (WebPageTest) ○ iPhone 5C - 4G (WebPageTest) ○ Nexus 5X - 3G Fast (Lighthouse) GDS Selected 5 pages to test - slightly different content and templates. Tested both on simulated devices and real devices via WebPageTest. Connection speed ranged from Native down to 3G Slow.

Slide 16

Slide 16

The results weren’t at all what I expected. Explanation of the graph (Nexus 5, 3G): ● diff between the HTTP/1.1 results and HTTP/2 ● Any bar above the x-axis is worse. Example: On the ‘homepage’ the first visual change was 149 ms slower on HTTP/2 than HTTP/1.

Slide 17

Slide 17

Pattern repeated itself across different setups and different tools. Example where HTTP/2 was actually quicker for one of the pages (bar charts below the x-axis).

Slide 18

Slide 18

Tried comparing a on a ‘warm cache’ (moving to the page via the homepage). Many of the results were the same: ● H2 starts well, ● Stagnates against HTTP/1.1 for 2 seconds (3G connection).

Slide 19

Slide 19

HTTP/2 - Initial trial Summary page made for a disappointing reading. Many test cases: HTTP/1.1 actually performed better than HTTP/2 under the synthetic tests I’d chosen. Only users with iPhone 5C on a 4G connection: performance actually improved across all pages.

Slide 20

Slide 20

Investigation What exactly was happening? Decided to leave HTTP/2 enabled for 1 month so we could investigate the issue.

Slide 21

Slide 21

Could see H2 was enabled, browser seeing it: ● reduced number of connection ID’s ● fewer TCP connections were being opened.

Slide 22

Slide 22

HAR files: multiplexing of files over the single TCP connection. HTTP/1.1 (left): files requested one after the other creates a stepped slope down the graph. HTTP/2 (right): vertical line showing all files requested at the same time.

Slide 23

Slide 23

HPACK header compression was working as expected. Compression seen after ‘space savings’: 0% for HTTP/1.1, 68% for HTTP/2.

Slide 24

Slide 24

Issue I had a theory about what the issue was.

Slide 25

Slide 25

Domain Sharding ● ‘www.gov.uk’ ○ Used for HTML only ● ‘assets.publishing.service.gov.uk’ ○ Used for all other assets GDS Had (and still have) a shard domain. This is a throwback to a ‘best practice’ for improving performance with HTTP/1.1. Only HTML is served from the origin. All other assets (CSS, JavaScript, Images, Fonts etc) all loaded from the assets domain (shard).

Slide 26

Slide 26

Via WebPageTest waterfall graph see the second TCP connection being established (highlighted in red). All CSS, JS and images waiting on this connection to establish. Was this the issue? How to reduce the time for 2nd connection to establish?

Slide 27

Slide 27

Possible solutions 3 possible solutions I could see to fix the issue:

Slide 28

Slide 28

GDS ‘preconnect’ hint header. Browser can connect to the assets domain earlier it won’t be waiting as long When assets are needed since the connection has already been negotiated.

Slide 29

Slide 29

HTTP/2 connection coalescing GDS HTTP/2 connection coalescing. Allows a browser to use the same TCP connection to transfer data from multiple domains (similar properties like IP address and SSL certificate) If working properly there’d be no need for the 2nd TCP connection at all. I read that post by Daniel Stenberg so many times…

Slide 30

Slide 30

Domain Sharding ● ‘www.gov.uk’ ○ Used for HTML, CSS, JavaScript, and images ● ‘assets.publishing.service.gov.uk’ ○ Used for all other assets GDS Lastly: remove the need for the assets domain for static assets. Serve everything from the origin, no wait time as connection already established.

Slide 31

Slide 31

GDS I even asked Pat Meenan at LondonWebPerf in December 2018. As you can see in a screenshot from the video.

Slide 32

Slide 32

HTTP/2 → HTTP/1.1 Trying to fix, number of weeks, no success. Decided to disable HTTP/2. Knowing it was negatively impacting many users, especially slow mobile connections. It was the correct thing to do.

Slide 33

Slide 33

Left it at that for a while. Few things happening in government for a couple of years. H2 wasn’t a top priority.

Slide 34

Slide 34

The rogue image GDS December 12th (my birthday), I received a question From my ‘How to read a WebPageTest’ blog post. Yew-li-a Lacoban (who may actually be watching) asked a question about an an image download happening in one of the waterfall charts.

Slide 35

Slide 35

GDS It was this image. Fairly unremarkable at first sight.

Slide 36

Slide 36

GDS It’s request number 3 that really stands out. A single image that looks to be out of place compared to other images.

Slide 37

Slide 37

GDS Full waterfall with other assets it gets even weirder. Image (labeled 1) actually downloading from the ‘assets’ domain, before the connection to the ‘assets’ domain has been negotiated (labelled 2) How is that possible?

Slide 38

Slide 38

GDS Answer: HTTP/2 connection coalescing. I was certain wasn’t happening in the initial trial. It turns out it was happening. Connection view: Connection number 1 you may just be able to make out 2 URL’s. www.gov.uk and the assets domain. Signifies that the the two domains have coalesced over a single TCP connection. Once the connection established, browser is downloading a single image from the assets domain.

Slide 39

Slide 39

GDS Another pattern connection view shows: ● Only HTML and images are downloaded on connection 1 ● CSS, JavaScript and fonts are only downloaded on connection 2. That’s unusual. So what was happening here?

Slide 40

Slide 40

Subresource Integrity (SRI) Using Subresource Integrity on both our JavaScript and our CSS.

Slide 41

Slide 41

GDS Security feature / stop third-party code that has been modifying from executing on your site. integrity attribute with a file hash (as seen in the code). Hash in attribute and the file hash of the asset downloaded don’t match, the file won’t execute.

Slide 42

Slide 42

GDS Requirements of SRI is the crossorigin attribute must be used (as seen in the code). Attribute provide support for Cross-Origin Resource Sharing (CORS). Setting this attribute to anonymous- forcing both the CSS and the JavaScript to be downloaded on the second TCP connection (we saw in the WPT connection view). An anonymous connection means that there will be no exchange of user credentials unless on the same origin: ● via cookies, ● client-side SSL certificates or ● HTTP authentication

Slide 43

Slide 43

GDS Second anonymous connection needs to be established before anything could be downloaded. All our CSS (which is render blocking) is waiting on this connection to be established. Example: CSS and JS allowed to use a credentialed connection (the one to the origin), bring download forward by 750 ms (in this example)

Slide 44

Slide 44

Change anonymous to usecredentials? GDS Rather than removing SRI completely, is there an alternative to anonymous? Looking at MDN documentation on the web there is: The use-credentials: allows the requests for the asset to include credentialed information.

Slide 45

Slide 45

RFC-114 GDS Following our RFC process for changes to GOV.UK, wrote an RFC and fed back on a few comments Proceeded with the change Note: All RFC’s are publicly available and can be found on Github.

Slide 46

Slide 46

GDS Tested this on a single application on our integration server. All the CSS / JS on the page failed to load. Console shows a CORS issue.

Slide 47

Slide 47

GDS When it comes to CORS, it always pays to read the fine print. Closer look at the Fetch specifications under CORS protocol and credentials. Row 5 states that: ‘If credentials mode is set to “include” (or ‘use-credentials’), then Access-Control-Allow-Origin cannot be *.

Slide 48

Slide 48

Access-Control-Allow-Origin and web fonts GDS Access-Control-Allow-Origin header is used to tell a browser where a cross-origin resource being requested can be used. If an asset is being requested cross-origin from a domain where this header isn’t set to “*” or the domain isn’t listed, you will get a CORS error. In our case: “Access-Control-Allow-Origin” header added to allow our web fonts to be viewed correctly in all browsers when served from the (cross-origin) assets domain. They are not only served with fonts, served for all assets. (issue now logged to fix)

Slide 49

Slide 49

● Access-Control-Allow-Origin: * ● crossorigin=”use-credentials” GDS You can see why it is written in the spec that way: ● Access-Control-Allow-Origin “*” is allowing an asset to be fetched cross-origin and executed from any domain ● crossorigin=”use-credentials” saying: allow this fetch to happen on a connection that can transfer credentialed information about the domain That doesn’t sound very secure…

Slide 50

Slide 50

Subresource Integrity (SRI) Next and easiest step would be to remove SRI from our CSS and JS. We weren’t using it in the way it was intended for scripts hosted on a third-party domain outside our control it was also a safe, low impact change.

Slide 51

Slide 51

RFC-115 GDS Now a different proposal, previous RFC was closed and a new one created, explaining all the details and learnings. Then waited a week for comments and feedback.

Slide 52

Slide 52

Nine small PR’s GDS No blockers so 9 small PR’s were raised to remove SRI from the relevant GOV.UK applications.

Slide 53

Slide 53

Results So let’s take a look at a few results.

Slide 54

Slide 54

HTTP/1.1 (SRI) to HTTP/1.1 (no-SRI) Interested to see the difference between the two setups (SRI to no-SRI) in terms of performance. SpeedCurve was a fantastic tool to visualise this.

Slide 55

Slide 55

Homepage - slow mobile (Samsung S3, 2G) GDS Graph of the homepage on a slow mobile (Samsung S3, 2G connection). Visually complete: dropped from almost 28 seconds to 18 seconds, a 36% improvement.

Slide 56

Slide 56

Answers page - medium mobile (Samsung S4, 3G) GDS Some instances we actually saw an increase in visually complete when removing SRI. Here’s an answer page on a Samsung S4 on a 3G connection. An increase of just under 1 second.

Slide 57

Slide 57

HTTP/1.1 with SRI GDS Examining: due to late loading fonts and the impact this has on the visually complete metric. SRI: browser opening 6 anonymous TCP connections. Fonts need to be downloaded via an anonymous TCP connection. Fonts have 6 connections to be downloaded on.

Slide 58

Slide 58

HTTP/1.1 without SRI GDS SRI removed: browser has no need to establish all the anonymous TCP connections. All assets can download via a credentialed connection. Upon font download: only one anonymous connection established. Browser has to open another very late to download the other font: extending visually complete metric.

Slide 59

Slide 59

HTTP/1.1 with SRI GDS We really start to see improvements is on the WebPageTest connection view graphs: In the example with SRI we have 13 connections: ● 5 anonymous connections ● 6 credentialed connections ● 1 third-party connection (GA) Note: big orange space after the font loading. inefficient use of domain sharding. extra connections opened by the browser aren’t being fully utilised.

Slide 60

Slide 60

HTTP/1.1 without SRI GDS Compare it to SRI disabled. Here we have 9 connections: ● 2 anonymous connections for the fonts ● 7 credentialed connections for all other assets. NOTE: much smaller gap is visible within the connections. Showing the open connections are being used more efficiently by the browser.

Slide 61

Slide 61

HTTP/1.1 (no-SRI) to HTTP/2 It’s looking better, but it can still be improved. So what about finally switching on HTTP/2?

Slide 62

Slide 62

Homepage - slow mobile (Samsung S3, 2G) GDS Graph of the homepage on a slow mobile (Samsung S3, 2G connection). Initial drop from the SRI change (first line), Additional drop due to enabling HTTP/2 (second line). Visually complete: 28 seconds at the start of January, down to 14 seconds now. A 50% improvement.

Slide 63

Slide 63

Answers page - medium mobile (Samsung S4, 3G) GDS We see the 1 second uplift we noticed from the SRI switch on the answers page fall right back down. Visually complete drops the peak of 5.7 seconds down to around 4.4 seconds. 23% improvement.

Slide 64

Slide 64

Start page - Chrome - Cable GDS We’ve seen this dip all over our SpeedCurve graphs, even on fast devices in modern browsers. Page load and fully loaded time drop by 100 ms, even on a very simple page like a start page. May not sound like much, page is loading in around 1 second anyway, 10% improvement on an already quick page!

Slide 65

Slide 65

HTTP/2 GDS My favorite graph is the connection view from a WebPageTest. We’ve gone from 13 TCP connections down to 2: ● HTTP/2 coalescing can be seen on connection 1 ● anonymous TCP connection for the fonts on connection 2 NOTE: hardly any empty space on the first connection, meaning it is being fully utilised. Observant among you: impact of the preconnect header on the 2nd connection. The connection is negotiated way before it is required by the fonts.

Slide 66

Slide 66

HTTP/1.1 with SRI enabled Lastly let’s relook at our summary table. One from earlier (initial trial). Unhealthy looking tests where HTTP/1.1 performed better than HTTP/2.

Slide 67

Slide 67

HTTP/1.1 with SRI enabled Updated table: Much healthier looking. Few instances and page setups where h1 performs better in some metrics so I judged them to be performing slightly better. Overall it is much improved. Couldn’t repeat the tests: iPhone 5C, and Nexus 5, having a few WPT issues at the time I compiled this table.

Slide 68

Slide 68

What’s next for GOV.UK? So what’s next for performance on GOV.UK?

Slide 69

Slide 69

● Access-Control-Allow-Origin: * ● Remove assets domain (for static assets) GDS Couple of issues left to fix in the RFC: Reducing the scope of the CORS headers (basic cleanup). ‘removal’ of the assets domain for our static resources. Serve all CSS, JS, images, and fonts off www.gov.uk. Browsers that have flakey HTTP/2 coalescing will then get the full benefits of HTTP/2. Second TCP connection for the font can then be removed. [fonts come from the document origin, they won’t use a separate connection] Single connection for all page assets, server can have complete control over H2 asset priorities.

Slide 70

Slide 70

TLSv1.3 (+ 0-RTT?) GDS Fastly started rolling out TLSv1.3 to POP’s across the globe. Could see some TLS negotiation performance improvements when this happens in the UK. Investigate the use of 0-RTT session resumption too. Allow users who visit the site on multiple occasions, use a previous TLS negotiation, could remove a chunk of time on initial page load (assuming the browser support that is).

Slide 71

Slide 71

Brotli compression GDS Brotli is a new compression algorithm supported that is now supported by 92% of browsers globally (caniuse). Research i’ve done for GOV.UK written a report, found it improves file compression over the network by around 20% compared to our current GZip implementation. This is something Fastly are working on. Beta program now being trialed. Could be a huge benefit to many GOV.UK users.

Slide 72

Slide 72

New webfont GDS Incredibly close to switching all apps over to our new web font reduces data required by 47% for both font weights we use

Slide 73

Slide 73

JS improvements GDS GOV.UK team are unpicking and removing our dependencies on jQuery. Soon be able to remove another 33KB of minified and compressed JavaScript.

Slide 74

Slide 74

Summary So there you have the story of how HTTP/2 was enabled on GOV.UK. It wasn’t as simple as just “turning it on”, but it was worth the time and investment. I’ve learnt a fair bit in the process which is always good. Couple of quick thank you’s: Thanks Andy Davies and Barry Pollard (HTTP/2 in Action). You would not believe the number of questions I’ve fired across to them both over the past 18 months. And finally thanks to the whole GOV.UK team. I feel very lucky to be able to work with such an incredible bunch of people who are always very patient with me when I propose changes!

Slide 75

Slide 75

Thanks for listening! Matt Hobbs Twitter: @TheRealNooshu Thanks for listening!