Netacea’s Approach to Machine Learning: Unsupervised and Supervised Models
Our world is driven by technological innovation. Recent years have seen many companies adopt artificial intelligence (AI) and machine learning technology to analyze larger data sets and perform more complex tasks with faster and more accurate results. This is not limited to technology-based industries such as computer science – now, many industries work continuously to enhance their technology to keep up with consumer expectations, with data-based decision-making often central to this drive.
What is a machine learning model?
Designed to imitate the way that humans learn, machine learning models make use of data and algorithms to gather knowledge and gradually improve accuracy over time. There are many machine learning applications; the two most commonly used and referred to machine learning models are supervised learning and unsupervised learning. The following outlines the differences between supervised and unsupervised machine learning programs, the benefits and drawbacks of each approach, and how Netacea uses a combination of the two machine learning models alongside anomaly detection, in our unique approach to bot management.
Types of machine learning models
Supervised machine learning
Supervised learning models are characterized by their use of labelled data, which is used to teach algorithms to classify data, or predict accurate outcomes based on the labelled training data.
Supervised learning algorithms can often be categorized into two types:
- Classification
- Regression
Classification uses an algorithm to assign new data to specific categories, based on training data. Regression is a supervised machine learning algorithm used to predict continuous values, again based on the initial training data. Supervised learning models are best suited to situations where there is a set of available reference points on which to train the data. That being said, data is not always able to perfectly align within certain categories or labels; when this is the case unsupervised machine learning models can provide a solution.
Unsupervised machine learning
Unsupervised machine learning models are used to analyze and group sets of unlabelled data. Unsupervised machine learning models can help with pattern recognition for previously unseen or undetected patterns within data, without being explicitly programmed or requiring any human intervention. There are three types of unsupervised machine learning algorithms:
- Clustering
- Association
- Dimensionality reduction
“Clustering” looks for similarities and differences within the data and will then use this information to form groups or ‘clusters’ of data. Similarly, “association” is an unsupervised machine learning algorithm that uses different rules or rulesets to find relationships between variables within the data. If the number of features in a set of data is too high, “dimensionality reduction” can be used to reduce the number of inputs to a more manageable size. Dimensionality reduction is sometimes used as a pre-processing step for supervised machine learning models.
Unsupervised machine learning models allow you to find and group previously unknown patterns within the data, without any initial manual input of labels or categories.
Benefits and drawbacks of machine learning models
While each approach has its merits, there are also some drawbacks to using one machine learning model over the other.
Supervised learning is a simpler method of machine learning, beneficial in situations where the goal is to predict outcomes of new data, whilst already aware of the type of results to expect. Although supervised learning helps you collect data, make predictions, and optimize performance criteria following the input of initial labels, supervised machine learning models can be time consuming and often require expertise when it comes to labelling the initial inputs.
Unsupervised learning is beneficial when the goal is to gather insights from large volumes of new, previously uncategorized data, or for anomaly detection. Whilst unsupervised learning is more adaptive and allows you to discover previously unknown patterns from data and find features for categorization, results from unsupervised machine learning models require expert human intervention and analysis to validate.
Why Netacea uses both machine learning models
Netacea’s multi-dimensional approach to bot management has our team of data scientists and bot experts using a combination of both supervised and unsupervised machine learning as well as anomaly detection to keep ahead of the continuously evolving bot threat.
Supervised learning allows us to ask, “Does this attack match a known attack pattern?”. We can then compare the data streams from our clients with those within our Bot Attack Intelligence feed giving us the ability to stop known bot attacks, as well as predict and prevent future attacks from occurring.
While supervised learning allows us to detect known attacks, unsupervised learning allows us to detect suspicious behavior, or patterns of behavior relating to new or previously unknown attack vectors by comparing the behavior of one user to others in the system. We use real-time clustering to group similar users, allowing us to spot when new clusters are created, highlight odd or atypical behavior, and constantly re-evaluate what a ‘normal’ pattern of behavior looks like.
By using both methods of machine learning, Netacea maximizes the benefits of AI and outweighs any drawbacks of using one type of machine learning over the other. Our Intent Analytics® engine, powered by these machine learning algorithms, provides an innovative and profoundly more effective alternative to the traditional “black box” or JavaScript-reliant solutions. This allows Netacea to always stay one step ahead of the bots.