The Lego Data Layer: Is Page an Entity or an Object?

In previous posts I argued that an entity is a thing that’s of special interest to the business. For ecommerce, an obvious one is product (but is certainly not the only one). An entity at its core is an object, but a special kind of object by virtue of its importance to the business.

We’ve also looked at how interactions are universally made up of the same basic components — subject, verb, object, context. We’ve seen that each of these components has its own dictionary. It’s all very conceptual so far.

The first step to putting this into practice using Google Tag Manager is working out whether a page is a business entity or an object.

Well, it’s neither.

You’d think that a page is an object at least but that would be wrong too. And that simple statement has dramatic consequences on how we design the page dictionary.

Take the most common analytics interaction of all, the pageview. If we were to map it to our event grammar, this is what we’d come up with:

user sees product page

Except that’s wrong. The user doesn’t see the page, they see the objects and entities that are on the page. You see, the page is nothing more than a container of stuff that is shown to the user. Therefore we design its dictionary accordingly.

The clue is in the page naming conventions

Think about it. Setting up decent naming conventions for pages is a core task to any Google Analytics setup. And how is that done?

Pages Assigned Page Grouping Additional Page Groupings
product pages Products By product type, brand, price, category, etc
product category pages Category Pages By category level, name, type, etc

Notice that what we’ve actually been doing is work out what business entity the page is about and then assign a page classification that reflects that entity. Not just that, but additional classifications are made according to attributes belonging to these entities. There is nothing inherently about the page itself in this classification.

Let me say it again. A page isn’t an object, it’s a container of objects (some important, some not so much) that the user sees. A mere vehicle for all of the entities and objects the user interacts with. So its dictionary would look something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
page:
    main_entity: 
        type: product
        id: 354
    assets:
        product_collection_recently_viewed: <product collection dict>
        ...
        product_354: <product dict>
        product_987: <product dict>
        ...
        cta_newsletter: <call to action dict>
        promo_spring2014: <product collection dict>

Therefore our humble pageview interaction becomes:

user is primarily shown product 354 and they are also shown product collection “recently viewed”, product 987, cta about “newsletter” and the “spring2014” promo.

Notice that I used the word “shown”. This was a deliberate choice. “Sees” implies an active action on the part of the user. In reality, the pageview is simply the consequence of an active click on a link that occured on the previous page. Which ties in nicely with my next point.

Why does this matter?

Increased need to track impressions

Enhanced Ecommerce has built-in support for product impressions but other entities need impression measurement, too. In order to measure the end-to-end success of an entity (be it promo, product collection, or something else), we also need to track when it was shown to the user. That’s essential context in analysis. After all, if the user never saw it, how can we possibly intepret interactions (or lack of) with it?


Using the page for dictionary lookups

When a website page loads, so do all of its assets. It’s how the web works. We can replicate that behaviour with respect to dictionaries, too. On page load we also load dictionaries for all entities and objects on the page. This creates a dictionary of dictionaries that we simply tap into whenever an interaction occurs on the page. We grab from it what we need to complete our interaction dictionary before we pass it to Google Tag Manager. Neat!


Automatic page classifications using lookups in the dictionary of dictionaries

Let’s picture the following scenario. User clicks on a link and is taken to a product page. The page loads all assets (required for functionality) and all asset dictionaries. A myriad of user-driven interactions can occur on this page. Some may be directly related to the product (e.g. add to wishlist), others might be directly related to another business entity (e.g. click on link related to promo).

In a previous post we’ve established that every interaction “recipe” will include a reference to what business entity it primarily relates to. We do this by including the following in the interaction dictionary:

1
2
3
4
5
...
    main_entity:
        type: product
        id: 354
...

Once we’ve got that, the sequence of steps is more or less as follows (simplified):

Interaction has occured

The interaction dictionary was passed to Google Tag Manager’s dataLayer. See How it works in practice for details on this step.


Interaction is linked to a specific entity instance

GTM specifically looks for an entity type and id somewhere in this dictionary (will discuss where this should go in a future post). This is how Google Tag Manager simply knows what that interaction was really about.


Look for matching dictionary

GTM then looks in the page.assets area of the page’s own dictionary to see whether a dedicated dictionary matching our entity type and id exists. Since we’ve designed the assets as a list of objects (rather than an array), we can access the dictionary directly: page.assets.product_354.
This means that we’re not flicking through assets trying to find a match. We “open” the dictionary at the precise location where we know the dictionary should exist. That makes for very fast lookups.


Unpack the found dictionary

Having found a dictionary for a product with id 354, its dictionary is passed as a whole to the dataLayer where it’s unpacked into individual attributes like name, price, cost, etc. These then make their way through to whatever analytics tags require them.


Automatic page classification based on business entity. Magic!

When GTM finds the dictionary for product 354 in page.assets, it also specifically looks for a classifications cluster of attributes within. Because our page dictionary told us that it’s primarily about a product, we know that the product‘s classification is the most appropriate way to classify the page itself.
GTM therefore transfers the product‘s entire classifications branch to the dataLayer and unpacks it into different levels of classifications which are then assigned to the page.


But does the page have any attributes of its own?

I believe it does but these are most likely related to page functionality and how entities and objects are presented to the user.

Here are some relevant fragments from the page dictionary:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
page:
    active_layout: grid
    active_filters:
        size: 12
        price: 
            low: 34
            high: 87
    pagination:
        page_number: 2
        total_pages: 5
        items_per_page: 20

This makes perfect sense. You don’t focus on the page itself when you’re trying to understand behaviour connected to business entities (product, product collection, promo, etc) just as you don’t look to business entities to answer questions related to website functionality!

The next post gets into the Google Tag Manager macros, rules, custom HTML tags and Javascript needed to get this live on an ecommerce website.

Thoughts? Please let me know in the comments.

The Lego Data Layer: Managing Event and Entity Dictionaries

My last post gave a simplified overview of the developer process behind this approach. I showed how the dev teams would create a set of helper functions to generate these dictionaries server-side. These functions would then be called whenever a new … [Continue reading]

The Lego Data Layer: How Does it Work in Practice

In previous posts I argued that every interaction has a structure that applies almost universally. Of its components, the most important ones are the object of the interaction (what the user interacted with directly) and the business entity it … [Continue reading]

The Lego Data Layer: Entity and Interaction Dictionaries

In the previous post we looked at the basic anatomy of an interaction and how similar it is to English grammar. I argued that (almost) every interaction is related to a thing that is of particular interest to the business. I call this an entity to … [Continue reading]

The Lego Data Layer: Anatomy of an Interaction

Like every other self-respecting digital analyst, I started playing with Enhanced Ecommerce soon after it became public. I'd read the Developer Guide several times and while it seemed quite straight forward on the surface, there was something nagging … [Continue reading]