Industry Insider: Opt In Or Lose Out: What Variety's AI Training Means For Content Creators

Corporate

Apr 17

The quick rise of generative AI has reshaped possibilities in the entertainment sector. AI models have grown to rely heavily on existing creative content such as video, text, images, and music. The question that has now come up is that of consent.

Variety’s March 2025 special report, AI Training Consent and Content, showcases the occurring technical infrastructure that surrounds AI training. The report details the tools, registries, legal trends, and data transparency initiatives that are aimed at giving content creators and rights holders some control in this new frontier. However, the deeper takeaway is this: The entertainment industry is seriously behind. And if creators and executives want a say in how their work powers tomorrow’s AI engines, they need to learn about AI and adopt

A WIDENING GAP BETWEEN AI DEVELOPERS AND CONTENT OWNERS

The report opens by exposing the power imbalance: while AI companies like OpenAI, Google, and Meta have advanced systems to scrape and ingest data at a large scale, many content creators are left with minimum defense—at most the decades-old robots.txt protocol, which was created for search engine indexing, not content creators.

Though some creators have adopted robots.txt blocks against scrapers like GPTBot and CCBot, the safeguard is basic-level and voluntary, meaning robots.txt blocks the same content reposted elsewhere. The takeaway is that, as of present, creators are defenseless against unauthorized AI training.

The report documents OpenAI’s attempt to reduce criticism with a potential “Media Manager” tool; the resource would ideally allow creators to identify and control how their works are used in AI creation/training. However, the tool has yet to materialize, as it has not been materialized (early 2025).

Consumer Perception of AI Training Consent by Age Group

This chart from the report shows that only 8% of U.S. adults believe AI companies “always” have permission to use content, while 27% say “not sure,” and over 40% say “rarely” or “never.”

OPTING OUT: STILL INADEQUATE, STILL CRITICAL

In reaction, a variety of organizational and technological answers have been developed to assist artists in "opting out." Some are rudimentary, such as Spawning's "Do Not Train" registry and Cloudflare's AI Audit tool. Others are more evolved like the C2PA's "Do Not Train" metadata tag, which combines usage choices directly into picture, video, and audio files.

However, the research underlines a significant obstacle: compliance is still voluntary, and most opt-out mechanisms have found a way to go around the flaw. Some have embedded metadata that can be removed, and previously scraped files stay in training datasets unless models are retrained, which is an atypical and pretty expensive procedure. For now, opting out is simply an invisible line in the sand—the groundwork for future licensing, but not a sustainable revenue stream in and of itself.

Websites Blocking AI Bots via robots.txt (June 2024 – Jan 2025)

An increased but fragmented adoption of robot exclusions. GPTBot (OpenAI) is blocked by around 25% of top websites; others like Meta’s crawler or Anthropic’s Claudebot are lower at around 8–15%.

OPTING IN: LICENSING, ATTRIBUTION, AND THE BUSINESS OF AI-READY CONTENT

Opting out provides protection, while opting in generates a stream of revenue. This is where the report becomes important for media executives, to help them learn about AI and how they can benefit from it.

The report describes the initial stages of an AI licensing market, with around forty transactions finalized by 2024. However, the agreements—mostly one-time arrangements between a publisher and a developer—are restricted in broadness and receptiveness. Important features like utilization metrics, data weights, and vector frequency are commonly overlooked.

The report contends that the industry needs three characteristics to create a sustainable structure.

Machine-readable content IDs tied to the rights holder
Granular access controls at the individual asset level
Attribution tools that track how and when a work contributes to a model’s outputs

And that’s where new startups come in. Ventures like ProRata and MusicalAI are developing vector-based systems; that track how often licensed content is used in AI generation. Other startups like Sureel and Vermillio are building content-matching engines that attempt to track influence and assign royalties accordingly.

Text Dataset Sources for Post-Training AI Models (2020–2024)

This chart from the report shows a steep climb in cumulative training tokens sourced from books, news sites, encyclopedias, and general web content—content types typically controlled by rights holders.

MARKETPLACES AND ECOSYSTEMS: BUILDING FOR SCALE

Beyond these resources, the report highlights the gradual growth of AI licensing marketplaces. Companies such as Human Native, Created by Humans, and Calliope Networks are mature platforms that help to allow creators to proactively license their work to developers on particular conditions.

On a larger scale, blockchain ecosystems such as Story Protocol and Personal Digital Spaces are establishing infrastructures for rights management, attribution, and automated royalty distribution. While they are still in their early stages, these ecosystems are potentially important for managing AI interactions on the scale required by reliable studios and publishers.

Creative Professionals’ Attitudes Toward AI Training Consent

Only 11% of creators say they’re happy for their content to be used freely in AI training. 35% would allow it with consent. A full 40% say “no way”. This data emphasizes a growing sentiment: consent is crucial; no consent, no creators.

WHAT MEDIA LEADERS NEED TO KNOW

So what should creators and entertainment media executives take away from this report?

Control tools are coming—but not fast enough. Companies should begin tagging content now, even if tools like Media Manager are still vaporware.
Opt-in is the path to monetization. Participating in registries and marketplaces is not just defensive—it’s strategic.
Attribution will drive recurring revenue. Being able to track the content is everything. Invest early in partnerships or platforms that can help find your content’s value inside AI systems.
Legal clarity is lagging. Until courts or legislation clearly define AI training under copyright law, technical measures and private agreements are your best protection.

#ContentCreators#GenerativeAI#DigitalRights#IntellectualProperty#VarietyReport

Maysam Khan

Industry Insider: Opt In Or Lose Out: What Variety's AI Training Means For Content Creators

A WIDENING GAP BETWEEN AI DEVELOPERS AND CONTENT OWNERS

Consumer Perception of AI Training Consent by Age Group

OPTING OUT: STILL INADEQUATE, STILL CRITICAL

Websites Blocking AI Bots via robots.txt (June 2024 – Jan 2025)

OPTING IN: LICENSING, ATTRIBUTION, AND THE BUSINESS OF AI-READY CONTENT

Text Dataset Sources for Post-Training AI Models (2020–2024)

MARKETPLACES AND ECOSYSTEMS: BUILDING FOR SCALE

Creative Professionals’ Attitudes Toward AI Training Consent

WHAT MEDIA LEADERS NEED TO KNOW

DEPARTMENTS

ABOUT

FOLLOW US

Industry Insider: Opt In Or Lose Out: What Variety's AI Training Means For Content Creators

A WIDENING GAP BETWEEN AI DEVELOPERS AND CONTENT OWNERS

Consumer Perception of AI Training Consent by Age Group

OPTING OUT: STILL INADEQUATE, STILL CRITICAL

Websites Blocking AI Bots via robots.txt (June 2024 – Jan 2025)

OPTING IN: LICENSING, ATTRIBUTION, AND THE BUSINESS OF AI-READY CONTENT

Text Dataset Sources for Post-Training AI Models (2020–2024)

MARKETPLACES AND ECOSYSTEMS: BUILDING FOR SCALE

Creative Professionals’ Attitudes Toward AI Training Consent

WHAT MEDIA LEADERS NEED TO KNOW

Gaming News: Fans conflicted on price of Nintendo’s Switch 2

Curtain Call: Denzel Washington’s ‘Othello’ Breaks Box Office Records At Broadway, Rachel Zegler To Appear At the west end

DEPARTMENTS

ABOUT

FOLLOW US