Bot Traffic Detection
Increasing your web traffic puts your products and services in front of a wider audience. However, as your website or app traffic grows, so will the presence of bot traffic. These bots won’t all be bad, but to minimise friction in the customer experience caused by bad bots and make sure your marketing insights are accurate, you must be equipped with bot traffic detection tools to identify and block bot traffic activity, such as loyalty point fraud.
How can you spot bot traffic
It’s often a tricky question for developers and engineers to answer. A bot may be indistinguishable from any other web user, but there are ways you can use analytics data or even your own browser console logs to help detect bot traffic. Integrated tools, f.ex. Google Analytics, give engineers and data scientists visibility to identify bot traffic indicators such as:
Unusually high page views
Unusually high page views can be an indicator of automated traffic and can be identified using dashboards that graph aggregate page views over time.
Unfamiliar referral traffic
Unfamiliar referral traffic looks at where your users are coming from. If the majority of visits to your site come from one or two sites, a high percentage could be bots.
Unusually high bounce rates
Low or high bounce rates can be indicators of bots using your site to find information quickly. Low bounce rates could be a sign that the bot has found what it’s looking for, while high bounce rates may indicate that you’re being used by a bot for SEO purposes.
Unusual visitor interactions with your site
If you see large amounts of users not engaging with certain elements such as buttons, or if they’re only using one element and ignoring others then this could be an indicator that these are bots.
Spikes in traffic from an unusual region
Bots are likely to be able to masquerade as human users from any region of the world. However, they may not have the same level of understanding about cultural or language differences, so spikes in traffic from countries where your company does not operate is a possible indicator that these visitors could be bots.
Abnormally low time on page
Bots are often programmed with an expected pattern of movement on a site and if they don’t reach their goal within a certain amount of time (which can vary depending on the goals of the website and internal JavaScript), they will often not proceed further.
Very high or very low average session duration
High average session duration could indicate that the bot is lingering on your site and can be seen as an indicator of malicious activity.
Constant refilling or refreshing of content
Bots are often programmed to feed off small pieces of information, which is why even if you’ve blocked a request through Google Analytics, it may still come back again with another user agent or IP address. The best way to pick up on this kind of activity is in your browser logs – when bots are constantly refreshing content they will show up in your browser console logs.
Fake conversions
If a bot is highly engaged on your website and then immediately leaves after clicking, you’re probably dealing with a malicious bot. For example, if a user performs an action like filling out a form but doesn’t continue to the next page or complete their action, most likely they are indecisive real human beings.
Anomalous timing of events
A lack of coherence between the time that users spend on one page and another can indicate that increased traffic may be automated. This approach applies more to browser-based bot detection than server-side solutions for server log analysis where anomalies in timing often come from network issues rather than bots.
Frequency of visits from any single IP address
IP addresses should only hit your website once per day or less. If this number is drastically higher (more than 100x), it could signal an automated visit—especially if these IPs belong to an organization that isn’t known to have visited regularly in the past.
Bot traffic detection methods & techniques
With the increase in malicious bot activity on the web, some businesses have embraced artificial intelligence (AI) to identify and block bot traffic, such as fake account creation.
From AI-driven machine learning techniques that look for anomalies in traffic patterns to dynamic Bayesian networks that process user behaviour data over time, there are a number of ways developers can employ this advanced technology against bad bots.
From a developer’s perspective, there are two ways to detect bots on your site: server-side via log analysis and client-side via JavaScript analysis.
Server-side bot detection
Processing bot requests on the server is advantageous because it gives you more flexibility in a variety of settings and can be used by other services like Cloudflare or Varnish. It also allows you to identify malicious robot activity without any client-side rendering – which is useful for mobile devices where JavaScript may not always execute correctly, although DDoS attacks against your infrastructure will still affect clients since they share the same IP address. The downside to this approach is that it takes longer for requests to process, so analytics data could be delayed.
Server-side solutions inspect access logs to look at what type of user agents are requesting content from your site; then they compare those user agent strings to known bots (or a database that contains information about known bots).
Client-side bot detection
This approach requires JavaScript to execute on the user’s browser to evaluate client-side behaviour. Since this is subject to a much wider variation in user-agent (particularly with mobile), you’ll need a whitelist of known good agents and then check each request against this whitelist.
You can also check the user agent against a list of known good browsers. In this approach, you compare what happens in JavaScript with data that’s been recorded server-side. An example is if a user performs an action (like clicking) and then rapidly leaves without following any other actions on your site, it’s possible they are simply using Autoplay on their web browser to browse through multiple websites at once and don’t represent actual human behaviour.
The downside to client-side detection is that it requires more work since every new bot needs to be added to your database, as well as slower page load times for visitors who end up downloading JavaScript code from your website.
Current challenges in identifying malicious bot traffic
The current challenges in identifying malicious bot traffic stem from the fact that bots have become more sophisticated. Nowadays, they are capable of mimicking human behaviour, making it difficult to determine whether a particular activity is automated or not.
It’s still possible to disguise malicious traffic patterns as legitimate ones using different user agents or bot-herding techniques (such as redirecting requests through a proxy bot). Plus some bots employ self-healing mechanisms that enable them to stay active when they are blocked.
A recent example of this type of obfuscation is DeepLocker. It’s an extremely sophisticated malware that uses advanced machine learning to detect whether it’s running on a virtualized environment or not. If the software determines it’s running in a sandbox, it won’t run malicious code and will act as if everything is normal. This type of deception has allowed DeepLocker to remain active for two years before researchers were able to spot and identify this particular strain of malware.
In addition, when DeepLocker is executed on an actual computer, you’ll notice network activity from websites like Google Analytics and Facebook — even though these sites aren’t actually visited since those bots never reach your web page either and just leave without any interaction.
Frequently asked questions about detecting bot traffic
How to identify bot traffic in Google Analytics?
If you use Google Analytics to monitor traffic on your website, there are some rules that can help determine whether a particular user is an actual human or a bot.
To detect bots in Google Analytics, you should check the following data:
- Source/Medium (Where did the request come from and what was requested?). This will allow you to look at requests coming from search engines via automated queries versus real humans looking for content on your site. Additionally, it’s important to check where people came from when they land on your site since bots don’t usually visit multiple pages as users do. For instance, if someone visits multiple pages while staying for 2 minutes or more per session and then leaves without taking any action, it’s a good indication they are using Autoplay on their browser.
- Landing page (What is the first web page where people land after they click on your link?). Checking the landing page can help determine if a visitor has actually clicked on your link or is just following it because of Autoplay.
- Time on site (How long do people stay on your site?). If visitors don’t spend enough time on your site to interact with certain elements, chances are they are not real human beings but rather bots trying to auto-fill fields, looking for vulnerabilities or just randomly browsing through your site.
- Bounce rate (What is the percentage of people who left your site from a certain page without visiting any other pages?). This will help you identify if some of your visitors tells a story of random browsing or if they actually clicked a link and landed on another page that they have not reached by following links on your web pages.
- Forms filled out (How many visitors filled out certain forms?). If your bots are programmed to fill in fields then this will be the percentage of people who tried to do so. As a rule of thumb, if you have large numbers of visitors who are filling out the same form over and over again, then there is a high probability that they are bots.
Instead of having rules in Google Analytics that direct you to block traffic immediately, it’s better to monitor these rules over a period of weeks or months before making any decisions about blocking suspicious requests since bots can adapt more quickly than solutions created by humans. Plus, if you have multiple lines of defence such as the ones mentioned above and most of them are triggered, chances are you’re being attacked by a bot or a group of bots.
How to identify bot traffic in Adobe Analytics?
Adobe has published a couple of methods for identifying bot activity. One example is using custom segments to find out how many sessions are filled with pageviews that contain only one hit and dividing those values by session length. You can also check on client-side events like scroll, click, mouse move and picture load which have been ‘hijacked’ by automated requests.
Another good method to detect bots is missing events which are relevant to your site such as rapid or repeated page reloads over a short period of time or large gaps in between user interactions on different parts of the site.
This approach will only work for bots that have automated scripts designed for browsing websites rather than actual humans who are accessing your pages with Autoplay enabled or if they use multiple devices like mobile phone browsers which don’t provide unique IDs.
This makes it tough to create rules for things like rapid repeated page reloads to spot bots since actual human users could do the same thing due to their erratic browsing behaviour.
What is the best way to monitor suspicious requests overtime before making any decisions about blocking them in order to avoid false positives?
It’s important to look at the big picture of your site instead of relying on specific rules. The best way to detect bots is by creating thresholds that you can monitor overtime. For instance, there might be a spike in single-page visits followed by several visits from human users and then another visit later on through a different source/medium with similar behaviour which would indicate something suspicious is going on.
Are certain types of content better suited to be targeted by malicious bots rather than humans?
Yes, if you have a high bounce rate on pages with links to PDFs or other types of content that is available for download online, chances are someone might be trying to infect your site with malicious code.
Subscribe and stay updated
Insightful articles, data-driven research, and more cyber security focussed content to your inbox every week.
By registering, you confirm that you agree to Netacea's privacy policy.