Machine learning (ML) and artificial intelligence (AI) have emerged as critical tools for dealing with the ever-growing volume and complexity of cybersecurity threats.
Machines can recognize patterns to detect malware and unusual activity better than humans and classic software. The technology also predicts potential attacks and automatically responds to threats by identifying specific trends and cycles.
Indeed, it’s not uncommon to have similar incidents that generally require the same response. Instead of repeating the same procedure, sometimes manually, the system can detect the attack, report and categorize the incident, and then apply the fix automatically.
Even better, security tools like behavioral analytics can spot attacks simply by noticing anomalous activity, important technology for catching zero-day threats and adversarial attacks.
It’s an excellent asset for cyber defense, but adversaries have learned to trick algorithms and even use similar technology to compromise targeted systems. It’s the latest chapter in the ongoing cybersecurity arms race, with attackers and defenders locked in a never-ending struggle, each adapting to the other’s innovations to try to gain an edge.
AI vs. ML vs. Deep Learning
First, some terminology, as there are many variations of computer learning. AI is the most generic term, including all other fields.
Machine learning systems help make decisions based on collected data and self-adjust their model when detecting new patterns. ML uses experience to improve its tasks.
When you train a model for image or speech recognition, it’s deep learning (DL), a subset of machine learning. The data go through several layers where hidden inputs and outputs execute predictive tasks and pass the result to the next layer, making the processing chain a complex structure, hence the word “deep.” Deep learning thus can be seen as a particular set of techniques of machine learning.
ML for Cyber Defense
The most common ML security approach is the regression technique, also known as the prediction. In this approach, defenders can use existing data to detect fraud and malware.
Statistical analysis on large datasets allows the ability to predict a computer’s behavior and anticipate actions that haven’t even been programmed.
Increasingly, popular tools such as Microsoft’s Windows Defender use this approach to identify and catch threats. Even some of the top consumer antivirus tools have begun to add machine learning-based detection.
Behavior analytics, often called UEBA for user and entity behavior analytics, is one of the more promising security applications of machine learning. UEBA tools look for new or unexpected network activity for detecting threats. Such tools can be particularly helpful for defending against zero-day, or previously unknown, threats.
More pragmatically, security teams can use ML to have a proactive defense for various threats such as:
- Spam prevention
- DDoS (distributed denial of service) mitigation
- Intrusion and malware detection
- Logs analysis to find trends and patterns
- Vulnerabilities check
- Botnet detection and containment
ML Can Escalate Attacks Dramatically
The threat landscape is constantly evolving. Most security specialists agree that more and more cybercriminals use ML to generate sophisticated attacks, evade detection, and bypass classic defenses.
CAPTCHA is no longer complex enough
CAPTCHA, for example, are quick challenges that invite users to solve elementary math operations (or copy/paste a series of random letters and numbers), which is supposed to be trivial for humans but extremely difficult for robots.
However, models are now remarkably efficient in solving such operations. Even if developers have been trying to make CAPTCHA ever more challenging to recognize and crack, it’s now a losing game for defenders, even for big platforms such as Amazon.
ML automates brute force attacks
Another black hat use of ML is for brute force attacks. Hackers can now generate accurate password lists automatically and even customize them according to a specific set of data (e.g., the targeted user’s info), significantly increasing the chances of success.
Defenders can no longer fight attacks with classic defenses, and it’s especially true with phishing campaigns.
New kinds of Phishing attacks
Phishing attacks are a traditional but efficient way to compromise a network. For years, attackers have manually collected information about their target to send them scams (e.g., malicious links) by email or social media messages using techniques like spoofing and social engineering.
ML can automate the whole process. For example, attackers can scrape the target’s profile on any social platform and generate phishing automatically.
It’s way faster than manual phishing, so more users can be targeted. In addition, attackers can customize the phishing with the user’s data and, thus, make it more credible.
Adversaries can fool algorithms
One of the most dangerous situations in cybersecurity is the false impression of safety. With the ability for hackers to fool DL recognition algorithms, AI defenses are no longer immune to attacks.
For example, by adding or removing a few details in medical images, researchers have successfully duped ML programs that check medical images for evidence of cancer, making the correct diagnosis hard and perhaps impossible for both machines and humans.
Considering hackers are increasingly targeting healthcare networks, that’s a huge concern.
More generally, algorithm hacking is skyrocketing. Whether it’s bots faking online engagement, consumers messing with algorithms for personal benefits, or fake companies trying to top legitimate businesses in listings, the risks are high and growing.
This never-ending arms race places a lot of stress on cybersecurity pros. About the only good news is quantum computing is a few years from reaching its threat potential.
Keep Humans in the Loop
ML still requires human supervision for certain aspects, such as reducing noise (irrelevant data) or limiting false-positives. Inaccurately calibrated models can lead to lots of false-positive results, and, in the worst-case scenario, the system can even take noise as a pattern and align the model with it.
To train your models, you will likely need third-party libraries from trustworthy vendors. Moreover, you will need to inspect training data regularly as a preventative measure against attacks such as poisoning, which may introduce multiple changes in classifications and divert the original model.
Your model needs to consider as many outcomes as possible to be efficient, but you should focus on quality over quantity. Having large datasets is not enough. Data must be associated with a relevant context and labeled, and for now, only humans can handle that costly critical step.
ML Introduces New Risks and Opportunities
As we’ve seen, there’s a massive risk of algorithm hacking and data poisoning these days, but there are other risks you should know.
Hackers can use free, open-source libraries such as stylegan2 (and many more) to generate fake human faces to create fake accounts with a pretty realistic profile picture. Even if other algorithms can spot artifacts in those counterfeit images, not all platforms run this verification.
Another problem is when the ML system is operating online. Indeed, it’s best if you can train your model offline, as attackers can potentially mess with ML systems connected to the Internet, misleading them with wrong inputs, for example.
However, the biggest concern could be privacy. Many companies have shared their users’ data with various third-party companies and developers for years, exposing them to data brokers and threat actors. It’s a significant risk for a business and its reputation if hackers manage to extract that data.
There are very few initiatives to make ML privacy-compliant and anonymize users’ data, but federated learning seems promising. Instead of uploading all datasets to one big server, this approach trains an algorithm across multiple servers with local datasets without sharing the precious data.
Each local instance trains its model on its dataset. Only trained models are sent to update a global model. It will take new thinking like that to reduce cyber risk in the AI age.
AI Joins the Cybersecurity Arms Race
AL and ML have joined the cybersecurity arms race, where attackers and defenders are locked in a never-ending battle for technical and tactical superiority. The role of a cybersecurity pro requires constant vigilance and ongoing learning, in addition to advanced tools capable of responding to ever-changing threats.