How to Use DOM Scraping to Quickly Collect Data in Tag Manager

In this I’ll be discussing how I quickly test new changes using only google tag manager and custom html tags. I have been a big proponent of dom scraping and using whats already on in the html of the page to collect data for analytics. One of the reasons I fell in love with google tag manager in its beta period was its custom html tag, that allowed for endless possibilities in terms of code you could execute.

I have written about dom scraping using js / jquery last year. And since then Google has also made scraping easier by providing Auto event listeners as well JavaScript macros, both of which can be used to extract element values and populate tags.

GTM also offers the ability to test and preview changes in debug mode, allowing a developer to test any change on the live site without affecting customers. Combine this with an ability to scrap and mold the data, in the dom and we have the recipe for an addictive loop, providing instantaneous feedback to any changes we make.

I am going to discuss how to use this approach to make deployment cycles short and sweet, as well as why to fight the urge to use this in production.

Using the approach is straight forward, all you need is a custom html tag or a js macro which uses javascript or a lib like jquery to extract some data from the page and populates either an analytics tag or sends data to another custom html tag for manipulation.

Structure

When doing scraping it’s ideal to isolate the tags and not mix them with production tags. I use -test- on these tags to keep the visually separate.

A rule of thumb is a single tag should perform one function.

For scraping to be accurate we should only do scraping once the DOM has completed loading and we can be certain all elements are available. This can be done using a native gtm rule

{{event}} equals gtm.dom

You can use the standard dataLayer syntax to communicate with other tags and pass data. Just like you would on a normal page.

For example if you are categorizing pages for Content Groups, one tag should do the extraction and then pass the data to another tag for sending to analytics service.

<script>
function categorizePages(pagePath){
    // code to categorize
     
    // send the data to dataLayer
    return dataLayer.push({'contentGroup':'xyz', 'event':'categorized'}); 
});
 
categorizePages('{{url path}}'); // url path macro     
</script>

Something like this can be triggered using a rule on every page

{{event}} equals gtm.dom

Firing order

A thing to remember when using multiple tags that are dependent on each other is tag firing order. We can build chains to trigger tags by firing events with each dL push and using rules in the next tag to trigger based on the event of previous one.

We can also use the Tag priority feature to prioritize which tags are triggered first.

Library Dependency

jQuery is a go to library for extraction, it is found on majority of sites and is a real time saver by enhancing capabilities of js code. If you are planning to use it make sure any code you trigger is done after the library has loaded. This can be done by triggering any tags dependent on this are executed after dom is ready using a rule

{{event}} equals gtm.dom

Code > Preview > Repeat

Once you have your code, in order comes the fun part of deploying it on the live site and seeing the data roll in realtime.

You can hit save on the tag > click on preview and load the site.

Didn’t come off as expected? No problem make a change and repeat.

Need another person to check your data just send them a share link.

Another approach I use it for is to prove value of data before doing a full scale deployment. With Google Analytics on feature tear and releasing cool things every month its tempting to want it all, using some gtm elbow grease allows for getting in to thick of it quickly. I generally do dom scraping setup an analytics tag with a test UA-code and let it run for a week. And then once we have the data it’s easier to make a case for its benefits.

Pro and Cons of using in Production

The ease of implementing things via dom can sometimes lead us to conclude everything should be done this way. But long term use of this approach can be harmful resulting in data loss. The main problem in DOM is a presentation layer and not meant for data collection, so a change to how data is presented would invariably lead to breaking tracking. Second since everything is done via gtm interface there is no indication to other team members that anything has changed.

Stephane Hamel has a great post on benefits and dangers of dom scraping you should check it out. To add to this I generally agree that, dom scraping should be used for quick prototyping and checking if the idea would work. As well as for short periods as a stop gap solution until the production code is ready.

Only exception to this is old legacy systems where documentation is incomplete and code is not being updated and you have somehow convinced them to add GTM once. Here your only way forward would be to do some js magic to get the data you want and this would be the way to go.

In a next post I will rundown how I deployed the entire enhanced ecommerce suite using this approach.

What are some ideas you are testing?