Good Bots vs. Bad Bots

Article Contents

    Bots are not inherently good or bad, but they can be used with good or bad intent. Working with bot management specialists and data scientists who can reveal patterns of traffic behaviour on your website will ensure you have a good insight into what good bots look like vs. bad.

    What is a bot?

    A bot, put simply, is a software programme that operates on the Internet and performs repetitive tasks.

    General bots

    There are typically two types of bots, and these include:

    • Chatbots: Chatbots engage in conversations with users, through text or voice. Specifically, these technologies utilise natural language processing (NLP) and artificial intelligence (AI).
    • Task Automation Bots: These bots are focused on automating repetitive tasks and data processing to name a few.

    What is an allowlist?

    An allowlist is likeable to an event guest list. Specifically, it permits access to specific bots to a web property and denies others. By doing this, user agents or IP addresses which are allowed will ensure that there is access for desirable bots.

    What is a blocklist?

    A blocklist is a list of banned IP addresses, user agents, or identifiers. A blocklist blocks specific bots while allowing all others access.

    What is robots.txt?

    Robots.txt is a web server text file that creates rules for bots accessing a website. Specifically, it sorts which pages bots can crawl and which links to follow, amongst other things too.

    Good bot management starts by properly configuring rules in the robots.txt file, this is because bad bots often ignore or exploit the file.

    What is a good bot and what is a bad bot?

    Here are the following identifying characteristics of good bots vs. bad bots:

    What are good bots?

    Web scrapers, for instance, can be extremely helpful to online businesses and play a vital role in driving highly relevant traffic to the organisation’s website. They helpfully gather large amounts of data from websites, combing through a site’s source code in their hunt for the information they’ve been scripted to locate.

    Search engine spiders are a useful example of a commonly used web scraper with good intent. Search engine spiders crawl websites, pulling together all sorts of relevant information such as copy, headlines, alt tags and product pricing to determine where that site should be indexed in the search engine results pages (SERPs). Without these clever good bots, no one would be able to find your website using words and phrases that are relevant to your product or service.

    Chatbots / AI / machine learning software, for example, Facebook’s Messenger bot or Google Assistant. These bots are used to automate routine processes and free up valuable time for those organisations that use them, whether they be large brands, small businesses or even individual users.

    Content aggregation bots aggregate content from a variety of sources to help users discover relevant information in a centralised manner.

    What are bad bots?

    Bad bots, on the other hand, cannot be regulated. By their nature, they are programmed to cause harm in one way or another. This is why it’s important to detect bot traffic and behaviour quickly, determine its intent and mitigate bad bots.

    Let’s refer again to web scrapers. These bots can be very useful to a business, but they can also be extremely harmful.

    For example, a competitor might use scraper bots to look at your prices and lower theirs accordingly driving your potential customers to their site. They might also be used by scalpers to detect the exact moment an item goes on sale, so they can automatically buy all the stock before genuine users and resell the items for profit.

    Some bad bots are very sophisticated and difficult to detect. These include:

    • Content scrapers which steal your copy, such as product descriptions etc. and publish it on their site. This can harm your SEO as search engines penalize duplicate content and may favour the cloned content over yours in results pages. Content scrapers can also be used o generate fake websites for use in phishing and man-in-the-middle attacks, fooling victims into entering sensitive details like passwords or payment details.
    • Credential stuffing bots which use credential lists leaked from other website breaches to test which passwords have been reused on other sites. This is a method of account takeover that grants the attacker full access to victim accounts.
    • Scalper bots which buy limited supply items faster than humans are able to, allowing the items to be resold elsewhere at a profit. These bots are commonly used to scalp tickets for popular events, or hoard limited edition sneakers and games consoles.
    • Carding bots which automate card cracking, where stolen credit card details are tested within payment portals so they can be used elsewhere or sold on the dark web. This can cause issues with third party payment providers and additional costs per payment attempt.
    • Fake account creation bots which generate new accounts in bulk, typically to exploit welcome bonuses and discounts, or to set up other attacks like scalping where a user account is required to access items or tickets.

    These are just some of the most common bad bot use cases. There are many more terrifying things they can do if not identified and mitigated quickly. Removing bad bots involves identifying them first which is where machine learning comes into play.

    Machine learning allows you to train an algorithm to understand what good (or bad) bot traffic looks like, allowing you to take action against attacks before they happen. So make sure you are training your algorithm on genuine traffic so that you can identify fake bot traffic when it appears.

    How to differentiate between good and bad bots

    The main difference between good and bad bots is their intent. If you’re using a bot for malicious reasons, then it’s either classified as a ‘bad bot’ or an ‘attack vector’ which is the term used to describe all forms of cyber hacking including phishing attacks, DDOS and other computer network exploitation techniques.

    Bad bots can often be defined by their output. For example, if your competition has employed web scraping software to extract information about you without your knowledge, they are running a form of bad bot. Another example would be if many new accounts were set up in quick succession on your website – this might seem positive, but if the action was automated, those accounts could be fake and used later in another attack, or to systematically exploit your new customer discounts.

    On the other hand, good bots are often used to improve your website’s search engine performance by indexing all of the content there. Some good bots have commercial benefits such as allowing legitimate third parties to access your live data and aggregate content as part of partner agreements.

    Block Bots Effortlessly with Netacea

    Book a demo and see how Netacea autonomously prevents sophisticated automated attacks.
    Book

    Related

    Blog
    Netacea
    |
    29/04/24

    Web Scraping

    Web scraping (or web harvesting or screen scraping) is the process of automatically extracting data from an online service website.
    Blog
    Netacea
    |
    29/04/24

    Two-Factor Authentication

    Two-factor authentication (2FA) is an extra layer of security to help protect your accounts from hackers and cybercriminals.
    Blog
    Netacea
    |
    29/04/24

    Non-Human Traffic

    Non-human traffic is the generation of online page views and clicks by automated bots, rather than human activity.

    Block Bots Effortlessly with Netacea

    Demo Netacea and see how our bot protection software autonomously prevents the most sophisticated and dynamic automated attacks across websites, apps and APIs.
    • Agentless, self managing spots up to 33x more threats
    • Automated, trusted defensive AI. Real-time detection and response
    • Invisible to attackers. Operates at the edge, deters persistent threats

    Book a Demo

    Address(Required)
    Privacy Policy(Required)