Latest

|

Macro Economy

Latest

|

Consumer Finance

AI

|

Latest LLMs

CX/CS

|

Fintech

Latest

|

AI Infrastructure

Enterprise

|

ROI of AI

AI

|

Ethics & Safety

Latest

|

Politics & Policy

AI

|

Enterprise AI

AI

|

Big AI

Latest

|

Consumer Banking

Latest

|

Fintech Funding

AI

|

AI in Fintech

CX/CS

|

Fintech

AI

|

Health Tech

AI

|

AI Governance

Latest

|

LLMs

Latest

|

Fintech

AI

|

Open Source

AI

|

AI Security

Enterprise

|

Cloud Security

Latest

|

Macro Economy

Enterprise

|

Enterprise Solutions

AI

|

GRC

AI

|

AII Adoption

AI

|

AI Ethics

AI

|

Healthtech

CX/CS

|

AI in CX

AI

|

Quantum Computing

AI

|

Cybersecurity

Latest

|

Healthtech

CX/CS

|

AI Adoption

AI

|

AI

AI

|

Safety and Compliance

Latest

|

Big Tech

AI

|

Consumer Tech

AI

|

AI Ethics and Risks

CX/CS

|

AI

Enterprise

|

Data and Privacy

Latest

|

LLMs

Latest

|

Banking and Blockchain

AI

|

Healthtech

Enterprise

|

AI in the Enterprise

AI

|

AI Risk and Compliance

AI

|

AI Arms Race

Enterprise

|

AI

Latest

|

LLMs

CX/CS

|

Compliance

CX/CS

|

Great CX

CX/CS

|

CS in Blockchain

AI

|

AI News

Enterprise

|

AI

|

CX/CS

|

CX/CS

|

AI

|

CX/CS

|

AI

|

AI

|

Enterprise

|

AI

|

CX/CS

|

CX/CS

|

Enterprise

|

Enterprise

|

Can AI be stopped from scraping the internet? Experts say it might be too late

Lorikeet News Desk

April 10, 2025

Can AI be stopped from scraping the internet? Experts say it might be too late
Credits: Pixabay via Pexels

Key Points

  • Jun Seki, CTO and Entrepreneur-in-Residence at Antler, highlights the challenges media organizations and businesses face when AI models scrape content without permission.

We use large language models to detect and label sensitive information based on what industry a company operates in. That way, businesses can set up policies to automatically classify different levels of information and prevent accidental data leaks.

Jun Seki

CTO and Entrepreneur-in-Residence | Antler

AI is upending industries at an unprecedented pace, but for media outlets, businesses, and even governments, the risks of AI could outweigh the rewards. From rampant content scraping and state-level surveillance, AI presents new threats that many are struggling to counter.

Jun Seki, CTO and Entrepreneur-in-Residence at Antler, has been closely analyzing the security risks AI poses—and he’s working on solutions to help businesses protect themselves.

Publishers losing control: For digital media organizations, the rise of AI-powered models has created a crisis. Publications that spent decades building credibility and revenue streams are now seeing their content quietly siphoned into AI training models—without permission or compensation.

"Newspapers feel powerless," Seki says. "Their articles have already been scraped and used to train AI models. It’s like they’ve been robbed, but they don’t know who broke into their house."

Some publishers are fighting back through licensing. In the UK, a specialized licensing agency has introduced an AI fair usage license, allowing media companies to charge for their content when used for AI training. While this is a step toward accountability, enforcement remains a major challenge—especially for smaller publishers that lack legal and financial resources.

Bot protection: For companies looking to protect their content, Seki suggests implementing technical defenses. "These codes are invisible to the average reader, but if an AI bot scrapes the content, you can trace it and use that evidence in legal actions," he explains. However, most content creators don’t have the capability to deploy such solutions. Until stronger safeguards are in place, Seki warns that "the only surefire way to prevent your sensitive information from being leaked to AI models is simply not to share it with them."

There are different levels of safeguards. At the LLM level, you need guardrails to protect against prompt injections and harmful content. But at the application level, we’re trying to build new solutions to filter and mask data before it even reaches the AI.

Jun Seki

CTO and Entrepreneur-in-Residence | Antler

AI under government control: AI’s risks extend far beyond content theft. Seki weighs in on Chinese AI privacy laws and how they intersect with DeepSeek: "If DeepSeek is used to organize protests against the Chinese government, authorities can take over the entire model," Seki warns. "They can look underneath the hood and track which users have been discussing certain topics."

While foreign companies operating outside China may be protected by their own country’s privacy laws, Chinese businesses using DeepSeek or other Chinese-backed models could face severe consequences. "If you’re a Chinese company using AI that’s monitored by the government, you’re at huge risk of landing on a government watchlist," he explains.

Building safeguards: Seki is actively developing solutions to address AI’s security vulnerabilities. "There are different levels of safeguards," he explains. "At the LLM level, you need guardrails to protect against prompt injections and harmful content. But at the application level, we’re trying to build new solutions to filter and mask data before it even reaches the AI."

His approach includes creating AI security tools across multiple platforms, from chatbots to Chrome extensions and desktop applications. "We’re working on ways to anonymize data and detect sensitive information before it’s shared," Seki says.

One of his core solutions focuses on businesses operating in heavily regulated industries, where leaking customer information can lead to legal and financial repercussions. "We use large language models to detect and label sensitive information based on what industry a company operates in," he explains. "That way, businesses can set up policies to automatically classify different levels of information and prevent accidental data leaks."

In industries like healthcare, where AI is increasingly used for patient interactions, these protections are critical. "For example, we can identify patient medical histories and anonymize them before they’re processed by an AI," Seki says. "Once data is used to train a third-party model, removing it is nearly impossible. That’s why prevention is key."

One more thing

We know AI chatbots that read your help center and summarize answers back to your customers are dime-a-dozen. The underlying technology is a commodity.

In fact we believe this so strongly, we’ll handle 100,000 FAQ lookup tickets for free.

Blue Sky