Bot Detection
Increasing your web traffic puts your products and services in front of a wider audience. However, as your website or app traffic grows, so will the presence of bot traffic. If you’ve ever wondered how to detect a botnet, it’s important to know that these bots won’t all be bad, but to minimise friction in the customer experience caused by bad bots and make sure your marketing insights are accurate, you must be equipped with bot traffic detection tools to identify and block bot traffic activity, such as loyalty point fraud.
How do websites detect bots?
Have you ever wondered how to identify bots? If so, bot detection involves deploying various techniques and tools. Some of these tools include user behaviour analysis, IP analysis, machine learning algorithms, CAPTCHA challenges, and device fingerprinting. These tools enable you to accurately differentiate between humans and bots and successfully identify bots.
How are bots detected?
If you’ve ever wondered how does bot detection work or how are bots detected, bot detection involves various techniques to identify automated programmes, or bots, on the internet. Common methods of detecting bot traffic include analysing user behaviour patterns, checking for unusual activity, monitoring IP addresses, and employing machine learning algorithms to detect bot-like characteristics. Completing all these bot detection techniques will help to distinguish bots from human users.
What are bot attacks?
Common bot attacks include:
- Scraping: quickly extracting content and information from a web page.
- Brute force attacks: the most common brute force attack is account takeover attempts when bots try to guess the credentials for an account or system.
- Denial of service (DoS)/distributed denial of service (DDoS) attacks: Bots attempt to slow down your website or app, high-volume requests to your server.
- Card cracking: Bots can test stolen credit card details to gather the missing data
How can you spot bot traffic
Detecting bots is tricky for developers and engineers too. Identifying bot traffic generally can be tricky because a bot may be indistinguishable from any other web user, but there are ways you can use analytics data, bot detection tools or even your own browser console logs to help detect bot traffic. Integrated tools, for example Google Analytics, gives engineers and data scientists visibility to identify bot detection techniques and indicators such as:
Unusually high page views
If you’ve ever wondered how to identify a bot, unusually high page views can be an indicator of automated traffic and bot detection can be identified using dashboards that graph aggregate page views over time.
Unfamiliar referral traffic
Unfamiliar referral traffic looks at where your users are coming from. If the majority of visits to your site come from one or two sites, a high percentage could be bots and bot detection could be necessary.
Unusually high bounce rates
Low or high bounce rates can highlight the need for bot detection, as high bounce rates can signify that information is trying to be found quickly. Low bounce rates could be a sign that the bot has found what it’s looking for, while high bounce rates may indicate that you’re being used by a bot for SEO purposes.
Unusual visitor interactions with your site
If you see large amounts of users not engaging with certain elements such as buttons, or if they’re only using one element and ignoring others then this could be an indicator that these are bots, and bot detection may be required.
Spikes in traffic from an unusual region
Bots are likely to be able to masquerade as human users from any region of the world. However, they may not have the same level of understanding about cultural or language differences, so spikes in traffic from countries where your company does not operate are a possible indicator that you need bot detection as these visitors could be bots.
Abnormally low time on page
Bots are often programmed with an expected pattern of movement on a site and if they don’t reach their goal within a certain amount of time (which can vary depending on the goals of the website and internal JavaScript), they will often not proceed further.
Very high or very low average session duration
A high average session duration could indicate that the bot is lingering on your site and can be seen as an indicator of malicious activity meaning that bot detection could be required.
Constant refilling or refreshing of content
Bots are often programmed to feed off small pieces of information, which is why even if you’ve blocked a request through Google Analytics, it may still come back again with another user agent or IP address. The best way to pick up on this kind of activity is in your browser logs – when bots are constantly refreshing content they will show up in your browser console logs.
Fake conversions
If a bot is highly engaged on your website and then immediately leaves after clicking, you’re probably dealing with a malicious bot and you may require a way to detect and block bots. For example, if a user performs an action like filling out a form but doesn’t continue to the next page or complete their action, most likely they are indecisive real human beings, hence in this case bot detection may not be required.
Anomalous timing of events
A lack of coherence between the time that users spend on one page and another can indicate that increased traffic may be automated. This approach applies more to browser-based bot detection than server-side solutions for server log analysis where anomalies in timing often come from network issues rather than bots, however, it is still important to be aware of this so you know how to detect and block bots.
Frequency of visits from any single IP address
IP addresses should only hit your website once per day or less. If this number is drastically higher (more than 100x), it could signal an automated visit and bot detection – especially if these IPs belong to an organization that isn’t known to have visited regularly in the past.
Bot traffic detection techniques & methods
From a developer’s perspective, there are two ways to detect bots on your site: server-side via log analysis and client-side via JavaScript analysis.
Server-side bot detection
Bot detection tools on the server are advantageous and offer some of the best bot detection software because they give you more flexibility in a variety of settings and can be used by other services like Cloudflare or Varnish. It also allows you to identify malicious robot activity without any client-side rendering – therefore, bot detection techniques are useful for mobile devices where JavaScript may not always execute correctly.
Server-side solutions inspect access logs to look at what type of user agents are requesting content from your site; then they compare those user agent strings to known bots (or a database that contains information about known bots).
Server-side bot detection protects everything, even APIs
Client-side bot detection must be configured to protect each page or path you think might be hit by bot traffic, with a different implementation per website and app. This is inconvenient and messy and leaves out a crucial attack vector increasingly targeted by bots: API endpoints.
Because APIs are designed to be interacted with by applications, not humans, it makes sense that malicious bots also use them as an entry point for attacks. This also means that client-based detection systems looking for human behaviour are simply not suited to detect and block bots that target APIs.
On the other hand, server-side detection sees every HTTP request, no matter whether it comes via a website, mobile application or API. Netacea has the same visibility via one integration of all web-based endpoints, including every path, every mobile app screen, and even APIs, as standard.
Client-side bot detection
Some of the best bot detection software requires JavaScript to execute on the user’s browser to evaluate client-side behaviour. Since this is subject to a much wider variation in user-agent (particularly with mobile), you’ll need a whitelist of known good agents and then check each request against this whitelist.
You can also check the user agent against a list of known good browsers. In this approach, you compare what happens in JavaScript with data that’s been recorded server-side. An example is if a user performs an action (like clicking) and then rapidly leaves without following any other actions on your site, it’s possible they are simply using Autoplay on their web browser to browse through multiple websites at once and don’t represent actual human behaviour.
The downside to client-side bot detection is that it requires more work since every new bot needs to be added to your database, as well as slower page load times for visitors who end up downloading JavaScript code from your website.
Current challenges in identifying malicious bot traffic
The current challenges in bot detection and identifying malicious bot traffic stem from the fact that bots have become more sophisticated. Nowadays, it is trickier to identify bot traffic, as bots are capable of mimicking human behaviour, making it difficult to determine whether a particular activity is automated or not.
There’s also a burgeoning criminal economy of spoofed and stolen device fingerprints being traded on the dark web via sites like the Genesis Market, so even if a fingerprint looks human, it could easily be a bot or malicious actor masquerading as the original user meaning trying to identify bot traffic could be harder than ever.
On-demand webinar: “How Bots Win: Bot Defense Bypass Techniques”
It’s still possible to disguise malicious traffic patterns as legitimate ones using different user agents or bot-herding techniques (such as redirecting requests through a proxy bot), making bot detection challenging. Plus some bots employ self-healing mechanisms that enable them to stay active when they are blocked.
A recent example of this type of obfuscation is DeepLocker. It’s an extremely sophisticated malware that uses advanced machine learning to detect whether it’s running on a virtualized environment or not. If the software determines it’s running in a sandbox, it won’t run malicious code and will act as if everything is normal. This type of deception allowed DeepLocker to remain active for two years before researchers were able to spot and identify this particular strain of malware.
In addition, when DeepLocker is executed on an actual computer, you’ll notice network activity from websites like Google Analytics and Facebook — even though these sites aren’t actually visited since those bots never reach your web page either and just leave without any interaction.
Make the switch to server-side bot detection
Putting Netacea’s server-based bot detection to the test against solutions still reliant on client-side code is a simple process. Because we only need to see web log data from your server, all it takes is a simple, unobtrusive API connection for us to start collecting the data you need – you can even send us historical data logs to compare previous recommendations offline.
Find out more about putting Netacea to the test with a free demo and POC.
Frequently asked questions about detecting bot traffic
How to identify bot traffic in Google Analytics?
If you ever wondered what is bot detection or how to detect bots, you use Google Analytics to monitor traffic on your website ensuring you have strong bot traffic detection, some rules can help with bot detection.
Bot identification in Google Analytics requires the following data:
- Source/Medium (Where did the request come from and what was requested?). This will allow you to look at requests coming from search engines via automated queries versus real humans looking for content on your site. Additionally, it’s important to check where people came from when they landed on your site since bots don’t usually visit multiple pages as users do. For instance, if someone visits multiple pages while staying for 2 minutes or more per session and then leaves without taking any action, it’s a good indication they are using Autoplay on their browser.
- Landing page (What is the first web page where people land after they click on your link?). Checking the landing page can help determine if a visitor has actually clicked on your link or is just following it because of Autoplay.
- Time on site (How long do people stay on your site?). If visitors don’t spend enough time on your site to interact with certain elements, chances are they are not real human beings but rather bots trying to auto-fill fields, looking for vulnerabilities or just randomly browsing through your site.
- Bounce rate (What is the percentage of people who left your site from a certain page without visiting any other pages?). This will help you identify if some of your visitors tell a story of random browsing or if they actually clicked a link and landed on another page that they have not reached by following links on your web pages.
- Forms filled out (How many visitors filled out certain forms?). If your bots are programmed to fill in fields then this will be the percentage of people who tried to do so. As a rule of thumb, if you have large numbers of visitors who are filling out the same form over and over again, then there is a high probability that they are bots and bot detection will be required.
Instead of having rules in Google Analytics that direct you to block traffic immediately, it’s better to monitor these rules to monitor bot traffic detection over weeks or months before making any decisions about blocking suspicious requests since bots can adapt more quickly than solutions created by humans. Plus, if you have multiple lines of defence such as the ones mentioned above and most of them are triggered, chances are you need to implement bot identification as you’re being attacked by a bot or a group of bots.
How do you do bot detection in Adobe Analytics?
Have you ever wondered how to identify bot traffic with Adobe analytics? If so, Adobe has published a couple of methods for identifying bot activity.
One example is using custom segments to find out how many sessions are filled with pageviews that contain only one hit and dividing those values by session length. You can also check on client-side events like scroll, click, mouse move and picture load which have been ‘hijacked’ by automated requests.
Another good method to detect bots is missing events which are relevant to your site such as rapid or repeated page reloads over a short period or large gaps in between user interactions on different parts of the site.
This approach will only work for bots that have automated scripts designed for browsing websites rather than actual humans who are accessing your pages with Autoplay enabled or if they use multiple devices like mobile phone browsers which don’t provide unique IDs.
This makes it tough to create rules for things like rapid repeated page reloads to spot bots since actual human users could do the same thing due to their erratic browsing behaviour.
What is the best way to monitor suspicious requests over time before making any decisions about blocking them to avoid false positives?
It’s important to look at the big picture of your site instead of relying on specific rules. To excel as a bot identifier the best way to undertake bot detection is to create thresholds that you can monitor over time. For instance, there might be a spike in single-page visits followed by several visits from human users and then another visit later on through a different source/medium with similar behaviour which would indicate something suspicious is going on.
Are certain types of content better suited to be targeted by malicious bots rather than humans?
Yes, if you have a high bounce rate on pages with links to PDFs or other types of content that are available for download online, chances are someone might be trying to infect your site with malicious code.