AMP

AMP Camp: Cross-origin user state in AMP

Websites

tl;dr: This article will teach you how to track user actions between your domain and AMP Caches.

Welcome to the latest in our series of posts about AMP Camp, our demo that shows how to create an interactive site with AMP! In this series, we’ll discuss the techniques and tools we used in creating it, as well as best practices we developed. If you’re interested in creating an interactive site with AMP, we hope you’ll learn something!

In the previous post, we talked about how to use templates on both the client and your server. In this post, we’ll discuss best practices for tracking user state between your origin and AMP caches.

User state and AMP caches

If a user visits your site on an AMP cache and again on your own domain, it’s important to recognize that both visitors are the same person. This can take a little work. Fortunately, there’s a solution!

Imagine this: you run BestClips, a site that sells the most effective paper clips in the world. Each clip can hold up to 50 sheets of paper, and they’re completely made from 100% biodegradable soybeans!

Since you want users to see your clips as fast as possible, you built your product pages with AMP. You serve AMP pages from your domain, bestclips.com. And when web spiders like Google’s and Bing’s discover these AMP pages, they get stored in AMP caches. So when a user discovers your product page in Google or Bing Search, they’ll be viewing it in an iframe on a site like google.com or bing.com, and the page will be served from an AMP cache, such as cdn.ampproject.org or bing-amp.com. So far, so good!

But what happens if a user discovers your page on an AMP cache, they add paper clips to their cart, and later in the day they visit your site again on bestclips.com? Will those clips still be in their cart? Or will the cart be empty?

(If your site uses a signed exchange, many browsers will show your origin domain even when your page has been served from a cache. In this case, this problem vanishes. Otherwise, it’s good to know how to deal with it.)

What’s the issue?

AMP caches help make your web pages fast while preserving user privacy! But a cache also introduces an extra level of complexity: users can access your site not just on your domain, but also on the cache’s domain.

Let’s say that bestclips.com, following standard web practice, tracks a user’s state by dropping a cookie that contains a session ID. Then, whenever the user visits your pages on bestclips.com, your server retrieves the cookie, reads the session ID, and restores the user state from data stored on your server that’s associated with that ID.

Now, let’s say the user visits your product page, they see some paper clips that they can’t live without, and they want to add those to your cart. They click a button, and your site submits data to your server:

<form action-xhr="/add-to-cart" method="POST">Code language: HTML, XML (xml)

If the user’s on your origin, the request comes with a session cookie. In part, the request would look like this:

POST /add-to-cart HTTP/2.0
Cookie: session_id=12345Code language: HTTP (http)

But if the user’s visiting your site on an AMP cache, that request to your server might actually emanate from ampproject.org or bing-amp.com – a different domain! The browser associates your cookie with bestclips.com, and thus it’s now a third-party cookie. Most browsers will cheerfully send these along. But users may have set their browsers to block third-party cookies. And some browsers will simply block a third-party cookie under certain circumstances. This would make your request look like this, with no cookie header:

POST /add-to-cart HTTP/2.0Code language: HTTP (http)

What to do?

The solution, in brief

(For simplicity, going forward, we’ll use the term “origin” to refer to your domain, and “cache” to refer to an AMP cache.)

The solution is twofold. On your domain, identify users with a session cookie, just as usual. On the cache and on a browser that accepts third-party cookies, do the same. Otherwise, whenever a user takes an action that modifies application state, redirect them immediately to your origin, where you can access or create a cookie stored under your domain, and then make the change desired.

In other words, if the user wants to add paper clips to their cart, and if you can’t read their cookie, don’t panic! Simply redirect them to your origin, where you can change their cart to your heart’s content.

This redirect is made possible by an AMP-specific HTTP header called AMP-Redirect-To. If an AMP page makes a server request using <amp-form>, and the server’s response contains this header, AMP will redirect to the desired page.

Here’s the entire flow:

  1. The user navigates to the product page. If the user’s on the origin, the origin sets a session cookie if one isn’t already present.
  2. The user takes an action to change what’s in the cart
  3. The browser sends data about the change to the origin via POST XHR
  4. The origin checks whether the request contained no session cookie and came from the cache
    • If that’s true:
      1. The response tells AMP to redirect to a URL on the origin which includes a query string that describes the user’s change
      2. When the origin sees that query string, it reads or creates the cookie, makes the change, and redirects again to a URL on the origin that doesn’t have that pesky query string
    • If that’s not true, then we can simply retrieve the user’s session and make the user’s change. Either we’re on the origin, or we’re on the cache with a browser that allows third-party cookies.

Whether the user begins on the cache or the origin, by the end of this process they’ve got a session and their changes are reflected on the server.

The solution, in detail

Let’s describe this process in detail. Let’s say a user visits our product page and decides to buy one of our new Superclips. Our product page lives at https://bestclips.com/product, but the user might access this page either on our origin or on an AMP cache.

Step 1. The user arrives at our product page. This page contains a form that allows the user to select a quantity, and a submit button that allows them to add the product to their cart. This might look like this:

<form action-xhr="/api/add-to-cart" method="POST">
	<select name="quantity">
		<option value="0">0</option>
		<option value="1" selected>1</option>
		<option value="2">2</option>
	</select>
	<input type="submit" value="Add to Cart">
</form>Code language: HTML, XML (xml)

Step 2. Let’s say the user leaves the quantity to “1” and taps “Add to Cart”.

Step 3. The form is submitted, and AMP sends an XHR POST request to our server, bestclips.com. (To ensure that this request will work even from a cache, you need to set up CORS headers. If you’re using node, you can just plug in the AMP CORS middleware.) On the origin, the request looks like this:

POST /api/add-to-cart HTTP/2.0
AMP-Same-Origin: true
Cookie: session_id=12345
quantity=2Code language: JavaScript (javascript)

Note that AMP conveniently adds the AMP-Same-Origin header when it’s running on the origin.

If the user’s on the cache with a browser that allows third-party cookies, the request will look like this:

POST /api/add-to-cart HTTP/2.0
Cookie: session_id=12345
quantity=2

If the user’s on a cache with a browser that blocks third-party cookies, the Cookie header will be missing as well:

POST /api/add-to-cart HTTP/2.0
quantity=2

Step 4. Now comes the fun part.

The server checks to see whether the request lacks a session cookie and came from the cache. If there is no cookie, then the browser hasn’t allowed a cookie to be set. That could also happen if the user’s browser blocks all cookies, in which case, redirecting to the origin wouldn’t help. We’d never be able to track the user’s state with cookies. This is why we also check to see whether the request came from the cache – because that’s the case we can deal with.

If the request lacks a session cookie and came from the cache, the request will lack both the Cookie header and the AMP-Same-Origin header, as shown above. The server detects this condition:

if (!request.cookies.session_id && request.headers['amp-same-origin'] !== 'true')Code language: JavaScript (javascript)

If that’s true, the server sends a response that instructs AMP to redirect to a URL on the origin. In that URL, it includes a query string that describes the user’s change, like this:

response.setHeader("AMP-Redirect-To", `https://bestclips.com?item=${request.body.itemName}&quantity=${request.body.quantity}`);Code language: JavaScript (javascript)

This sends a response that contains a header like this:

AMP-Redirect-To:https://bestclips.com/product?item=superclip&quantity=1Code language: JavaScript (javascript)

When this response gets back to the browser, AMP notices the header and redirects to https://bestclips.com/product?item=superclip&quantity=1. This request goes to bestclips.com, our origin! The origin server can then read the session cookie or create one if it doesn’t exist. It adds one superclip to the user’s cart. Then it redirects to https://bestclips.com/product

In other words, it redirects back to the product page without the query string. That way, the user won’t experience difficulties that could be caused by the query string. (See “Here’s how not do it” below for reasons.)

As promised, now the user has a session cookie and the change has been made on the server.

Can I use Client ID instead?

If you’ve worked with AMP, you may know that there’s the Client ID, which allows analytics packages to track a user’s journey from cache to origin. On origin, it’s stored in a cookie and persists for a year. On the cache, it’s also stored in a cookie, and if it’s not in the cookie, it can be created with a call to the Client ID API. So it would be tempting to use this to consistently identify a user.

Unfortunately, although sites do use this solution, it comes with flaws. The Client ID identifies a single user uniquely for certain journeys between origins and caches, but not all. And its cross-site behavior can be blocked by the same browsers we’ve been dealing with throughout this article.

The AMP Linker makes these cross-site journeys more reliable, since it preserves the client ID as a query string parameter. This provides another way to use the Client ID! But it does mean that the user’s unique identifier will then be visible in their URL. URLs tend to get logged on servers, and sometimes bad actors discover the log files. Worse still, the user might well share their URL publicly, exposing their identifier to the world. In either case, their session is vulnerable to hacking! This is why we used POST instead of GET in our examples above.

Do I really need to do this?

You may not. But as browsers block third-party cookies in more and more cases, this solution will increasingly be essential to let users use your site smoothly across your origin and AMP caches. And while the flow above takes some time to explain, it’s not hard to implement.

Really?

Check out how we did it in the AMP Camp demo site. Here’s the server code. And here’s the form on the product page. The only difference is that, on this demo site, when the user adds an item to their cart, we redirect them to the cart details page.

Written by Ben Morss, Developer Advocate