Block annoying referral spam
This post is really important for your Google Analytics. The information is a bit technical but I thought since so many of my clients were having issues I just had to share. After all the time and hard work we pour into our sites and blogs no one wants to be dealing with dodgy data.
Recently I was looking at my website traffic in Google Analytics and I saw there were a lot of weird referral sites ranking really highly. After further investigation I realised these sites were spam and they were really messing with my site stats. A few days later I saw the same spam on several client sites so I thought I would give you a guide to blocking this sort of traffic from your analytics results, as it seems to be a big problem right now for everyone!
Having traffic analytics on your site is essential for knowing who is coming to your site, how often they are coming, where they are coming from, how long they spend on there, what devices they are using, and how they move from one page to another or when they bounce. In fact, the wealth of information you can glean from your analytics will help you make better decisions at both a strategic and tactical level in your business.
When spam traffic starts to creep in and alter your analytics this can be a big deal. While most of us do want big traffic numbers we definitely don’t want fake ones. For example: one of my clients is altering the home page of her online store. With the huge influx of spam traffic she can’t see what real customers might be doing. This means she doesn’t know if the changes she is making are helping or hindering her real customers.
Signs you might have spam
The way I found my spam traffic and that of my clients was from suspicious referrals in my reports. But other indicators such as big changes in your metrics (such as very high bounce rates or low session times) can also tell you there is something going on. Another common characteristic that I have seen is that the spam has “not set” in tables for dimensions like Country, City and especially in the Hostname.
What is referral spam?
A referrer is a name that is passed along when a browser goes from one page to another page, and usually it is used to indicate where the user is coming from. For example, if I linked to your site in one of my posts, lisakate.com would be the referrer to your site and it would show up in your analytics. Referrer spam is a fake referral that is created by a bot (automated software) and makes repeated requests with the intention of showing up in your reports so you click on their link and give them traffic. These spammers hit thousands of Google Analytics accounts which helps get a whole lot of traffic to their sites that might have affiliate links, ads etc. These spammer bots inflate your numbers, making your data inaccurate and ultimately not useful. (Spammers: you suck.)
How do they do it?
So this is where it gets technical! I spoke with my super friendly hosting company WP Engine about my spammer problems and they explained to me that spammers either cause these issues through Ghost Spam or Crawler Referral Spam. Why does the difference matter? For us it matters as each requires a different solution in your Google Analytics.
Ghost spam is so sneaky it doesn’t even come to your website, it just targets Google Analytics tracking IDs and injects the information directly! This means you cannot block them from the server as it never reaches your site, so we have to do it inside Google Analytics.
The second type of spam — Crawler Referral Spam — actually visits your site and can be blocked by your servers, but a solution inside Google Analytics can also do the job.
In speaking with my host I was told that the majority of my spam traffic was Ghost Spam and so that is what we are going to focus on fixing in this post today. But first I will show you a quick fix to exclude known bots and spiders.
Excluding Known bots and spiders
Known bots are not bad, they in fact keep the internet running such as Google bots who add information to Google’s search engine. But they also add records when visiting your site and these records are not useful to your numbers. You definitely don’t want to block these bots as then Google and other search engines cannot crawl your pages, but Google Analytics now has a feature where you can exclude them safely from your analytics.
- Login to your Google Analytics.
- Go to the Admin tab.
- Select the view you want to apply it to eg. All website data (we are about to add a new view below).
- Click on view settings.
- Scroll down and check the bot filtering box.
- Hit save!
Nice work — you are definitely on your way to more accurate traffic! Next, let’s get rid of that ghost spam.
Getting rid of Ghost Spam
This method may seem complex compared to the above checkbox but it is totally worth it in terms of getting rid of that referral spam from your analytics. Mainly because:
- It requires only one filter.
- It stops the spam before it hits you, so there are no records from those pesky bots that you need to remove.
- It stops ALL ghost spam in any form whether it shows as a referral (e.g.floating-share-buttons.com), keyword (e.g. erot.co) or a direct visit.
So how does this filter work?
All ghost spam (I have recently learned) uses an invalid hostname, as the spammer actually does not know who the target is so this is why you see a fake hostname or “(not set)” in your data. The filter we’re going to create works by getting a list of all valid hostnames for your website and setting them as the filter so you just get valid traffic.
1. Create another view in your analytics
Before you start setting up the filter create another “view” of your analytics so that you can place this new filter in that view and keep your old view (and analytics numbers) going as before. To do this jump into your admin tab. On the far right under the heading “VIEW” in the dropdown select “create new view”. I call my new view the “No Spam” view as this will have the filter set up within it.
2. Getting your valid list of Hostnames
The first step in creating the filter is to get a list of your hostnames. To do this you need to make sure you are in your “All Website Data” view or the main view you have been using if you have created any others:
- Go to the Reporting tab on GA and select a wide timeframe on the calendar.
- In the left hand bar select Audience.
- Expand Technology and select Network.
- Then at the top of the report make sure you select “Hostname” because by default your Service Provider is selected.
Once you have been through these steps you get to a table of hostnames (as seen on the right of Figure 1 below). From this you can see valid hostnames like your URL (i.e. all the places you would have valid tracking data), or coming from valid URLs like Google translate or other URLs that you redirect to your main URL. Some of mine are highlighted in green.
Non-valid host names are anything you don’t recognise or even URLs like amazon.com and apple.com which are a favourite for spammers to use.
3. Your list of Valid Hostnames as a regex file
Now you have a list of valid hostnames you need to put them in a single line of text and separate them with the “|” and put a backslash “\” in front of all the “.” characters. We are getting seriously tech now as this is called a Regular Expression or a regex. Mine looks like this:
Please note no “|” at the start or the end. Don’t leave any spaces and don’t go over 255 characters. If you are keen find out more check out regular expressions!
Okay, can I just say you are doing so well. Stick with me, we’re nearly there.
4. Build your hostname filter
You have your regex and now we need to go and build the filter.
- Go to the Admin tab and select the view you want to apply the filter to. Remember we just set up the “No Spam” view for this.
- Select “Filters”.
- Hit the “+ NEW FILTER” button.
- Select “Create New Filter” and enter a name. It can be “Ghost spam filter” or whatever suits you.
- In the filter select “Custom”.
- Then choose “Include”.
- In the filter field choose “Hostname”.
- Then in the filter pattern paste in your regex.
- Now, because you have just created this view and started collecting data you won’t be able to verify the filter as there won’t be anything to compare it against. So go ahead and save the filter (last step not shown below in Figure 2).
This new view should be all set to exclude ghost-referral spam — which is excellent! But filtered views will only be applied to data from the time you created the filter. Thus, this filter won’t help you with historical data. BUT if you want to bear with me for one last step we can create a segment to apply your new filter with the historical data you have collected.
5. Create a segment to view historical data with your new filter
- Go to the reporting section in your Google Analytics.
- In the left-hand menu expand “Acquisitions” and under “All Traffic” select “Referrals”.
- In the main board click on “+ Add Segment”.
- Click on “+ New Segment”.
- Type in the name of the new segment e.g. “Valid Hostnames”.
- Select “Conditions” below advanced.
- Check that the filter is on “Include”. Click on “Ad Content” [sic], type “Hostname” and select it. Next click on “Contains” and select “matches regex”.
- Copy and paste the hostname regex you prepared earlier in the text box.
- You will see the ring graph to the right adjust to the conditions, as we remove referral spam (see Figure 5).