Update May 28, 2020: Related Google Search announcement
In March of 2018 we wrote about our effort to contribute back the things we learned from AMP for the benefit of all web developers by addressing gaps in the web platform. In this work we’ve collaborated with the Google Chrome team, Igalia, who is helping us with implementation work in WebKit, and, of course, the wider web standards community. We’ve also learned a lot from the community and have seen a lot of exciting work that we can’t wait to apply. This is an update on how that effort is going.
Our work has focused on bringing the benefits that AMP implemented on the application layer to the whole web platform (AMP or otherwise):
- Measuring performance and user experience: Make it possible to objectively measure high load performance and certain UX properties, such as a stable loading experience without content that is shifting around as it loads.
- Privacy-preserving instant loading: Allow the preloading of web content, before a user clicks, without leaking the user’s interest in the page to the page author until the user navigates.
- Innovation in navigation: Navigation from one web page to another through means other than the traditional click. E.g. by swiping through a carousel or scrolling from one article to another one.
- Guardrails: Provide a mechanism for development teams to avoid the use of certain legacy features of the web platform that are harmful to UX. Give development teams similar control over the behavior of third-party code.
We’ve made significant progress distilling our original wide-ranging ideas and worked with the web community to turn them into concrete standard proposals. This is a lot to go through, so we’ll split things up into a series of posts starting with the topics Measuring performance and user experience and Guardrails today.
Measuring performance and user experience
The AMP team often gets the question: Why don’t you just measure whether a page is fast instead of relying on AMP? And we agree, that is a good question.
One of the big changes in the web space since AMP was created is that there are now tools like the Chrome User Experience Report, which do, in theory, provide a way to get performance data for most sites on the web. Looking at this further we realized, however, that the existing metrics are insufficient to really make statements about the desirable user experience properties of web pages.
Legacy metrics like onload aren’t great proxies for user-perceived performance. Pages can both appear loaded long before onload fires and can appear NOT to have loaded long after onload fires.
One modern metric, First Contentful Paint (FCP), is the only currently available candidate metric for measuring perceived performance at scale. Unfortunately, while good user experiences always have a good FCP, having a good FCP is insufficient to really know whether a page is fast. FCP should probably have been called First Not-Completely-Trivial Paint because it is clear that users would in many instances not consider a page to be sufficiently rendered at the time of its FCP. For example, a loading indicator would trigger FCP, but the user wouldn’t be happy at this stage.
For these reasons we invested in defining additional metrics that would paint a more holistic image of user perceived performance. The two metrics currently under discussion are Largest Contentful Paint (LCP, name subject to change), which fires when the largest element of text or image on a page has painted, and Layout Stability, which is an indicator for how stable the layout is during page load. We’re hoping that these metrics, together with First Input Delay, which is further along in standardization, would allow one to say with reasonable confidence whether a page is fast and well-behaved from a user point of view.
Guardrails
AMP introduced the AMP validator to ensure that documents complied with a range of best practices and to give helpful error messages when they didn’t. Based on earlier work by Tim Kadlec and Yoav Weiss on the Content Performance Policy idea, the notion of Feature Policies was developed through discussions with the web community. They provide both a way to turn off harmful legacy features, such as synchronous XHR, and to report violations of best practices, such as loading images that are much bigger than necessary, to the development team of a website.
A whole range of Feature Policies are in the works now covering many aspects of web development. One interesting aspect for Feature Policies that while wide browser support is desirable for enforcement, for reporting only mode, where you get notified about issues with your website, partial browser support is often sufficient to solve the identified issues for users of all web browsers.
Summary
This ends part 1 of our series of posts on contributing back lessons learned from AMP to the whole web. Thanks to the whole web community for the amazing help in getting these standards proposal on track and the great feedback to make them much better than we could have ourselves!
Next up in this series of posts is Privacy-preserving instant loading and Innovation in navigation. They’ll be posted on this blog in the coming days.
Posted by Malte Ubl, Member of the AMP Technical Steering Committee, Software Engineer at Google
What is the advantage of this compared to measuring with User timing? At a glance it seems that you dont need to insert timing marks. But are there other advantages such as more accuracy?
Has this been measured in SPAs such as Angular? The challenge I see is that if I run a network test the test will end with onload event but content will appear after in Angular
Good question with respect to User Timing. These are for very different purposes.
User Timing is for telling you that the thing you care about is fast. Metrics like LCP can be used to compare sites to each other (since they don’t rely on manual instrumentation) and for easy to use tools to get a good first impression about a side. But as you mention, metrics like LCP just won’t always do what you need. And that is what User Timings are for.
So, we really need both to cover those 2 very different use cases.