Sections

Commentary

Same gatekeepers, new tollbooths in the AI content licensing market

June 9, 2026


  • Referral traffic from search became the dominant driver of publisher revenue at the turn of the century, and the economic logic of online journalism was set by the companies running these engines.
  • AI is now having a similar effect on journalism as publishers face waning reader traffic and advertising revenue despite LLMs using their content to generate answers to users’ queries.
  • Policy interventions are needed now as the deal structures, price precedents, intermediary take rates, and governance norms being established right now will be difficult to dislodge once normalized.
A person takes a picture of a screen during a presentation about AI (artificial intelligence) at the Frankfurt book fair on October 16, 2024, on the first day of the world's biggest book fair in Frankfurt am Main, western Germany. The Frankfurt Book Fair, in its 76th edition in the year 2024, runs from Wednesday, October 16, to Sunday, October 20, 2024.
A person takes a picture of a screen during a presentation about AI (artificial intelligence) at the Frankfurt book fair on October 16, 2024, on the first day of the world's biggest book fair in Frankfurt am Main, western Germany. The Frankfurt Book Fair, in its 76th edition in the year 2024, runs from Wednesday, October 16, to Sunday, October 20, 2024. (Photo by Kirill KUDRYAVTSEV / AFP) (Photo by KIRILL KUDRYAVTSEV/AFP via Getty Images)

When Google began indexing news websites at the turn of the century, publishers exchanged free listings of their content in exchange for referral traffic, at a rate of 2-to-1. That was before Google monopolized digital advertising. Within a decade, referral traffic from search had become the dominant driver of publisher revenue even as Google crawled their sites at a greater rate than it returned traffic, and the economic logic of online journalism was set by the infrastructure of a company that now used snippets and high-quality photos to index results and established an illegal monopoly in search (and would turn out to have an illegal monopoly in the digital advertising market as well).

The story of artificial intelligence (AI) and journalism now follows the same arc, only faster. A small-town reporter who finishes a story about the local school board tonight may find that by morning an AI system has crawled it, synthesized its facts into an answer for a user query, and served that answer without sending anyone back to the publisher’s website. This results in waning reader traffic, advertising revenue, and subscriber conversion for the publisher. While the AI company gets to offer a product that better meets the needs of its user in part through the availability of journalistic information, it comes at a cost—often the models do not attribute where it sourced its information nor return any of the value generated by the user to the source. Now Google is reportedly crawling exponentially more times per referral, with the other AI engines offering even worse referral rates, with no way to opt out without also harming one’s search visibility. Generative AI is replicating many of the same dynamics at a scope and speed that makes the search and social media moments feel quaint.

It is in this context that a market for AI content licensing has begun to take shape—the subject of a new report, “Same Gatekeepers, New Tollbooths: Mapping the AI Content Licensing Market,” which I co-authored with Karina Montoya at the Center for Journalism and Liberty at the Open Markets Institute. The report reveals that the conditions under which the publishing market is forming closely resembles those that structurally damaged journalism and the public interest. Our report also comes on the heels of two major independent analyses of AI and copyright—studies published by the U.K. House of Lords and the European Parliament, which arrive at conclusions strikingly convergent with our own. This means three independent bodies working according to different methodologies and different institutional vantage points reached the same diagnosis, suggesting the window for policy intervention may be open to mitigate the harms.

Figure 1

We've been here before

Over the course of my career as a journalist and scholar, I’ve been writing about the need for a more nuanced approach to valuing the press-platform value exchange since well before generative AI became widely accessible. My research has long underscored that the journalism sector was going to need a more sophisticated framework for valuing its contribution across the full AI value chain and not just the referral traffic layer that platforms had always used to define what news was worth to them. That narrow framing had already cost publishers enormously in the search and social media eras, and accepting it again as AI’s governing logic would compound the damage.

Instead, we must keep in mind that the leverage of technology actually runs both ways. AI systems depend on a continuous supply of high-quality human content to remain useful. Degrade that supply—by destroying the economic conditions under which it’s produced—and you degrade the AI itself. Publishers and creators have more bargaining power than they tend to act like they do.

3 tiers, 1 structural problem

The AI content licensing market operates across three distinct tiers, each with its own dynamics and each with significant limitations.

First tier

The most visible tier is bilateral deals: confidential agreements between major AI companies and select publishers typically with national or global brand recognition (and the ones most likely to be capable of suing). But while there have been real sums of money flowing to real newsrooms, our research shows they are not doing what publishers may have hoped.

Publishers with direct AI licensing agreements initially enjoyed a substantial click-through advantage from AI interfaces. By the fourth quarter of 2025, that “deal premium” had essentially evaporated—amid a six-fold collapse in click-through rates from AI systems. Publishers without deals fared worse in absolute terms but experienced a smaller proportional drop. But both groups lost. The bilateral deal market is not insulating anyone from the broader erosion of AI-driven referrals, and it is structurally inaccessible to the vast majority of publishers around the world or at the local level.

For example, the Lords Committee rightfully observed that limited disclosure makes it difficult for rights-holders to know whether their works have been used or to enforce their rights. As a result, publishers negotiating bilateral deals are doing so without visibility into how their content is being used, at what frequency, or to what commercial effect, which means they are negotiating blind. Our interviews with AI licensing startup founders confirm this is precisely the information gap that intermediaries have found a commercial foothold in trying to address.

Second tier

The second tier is the intermediary layer, a field that expanded from a handful of Silicon Valley startups to more than a dozen companies since 2024, including startups like TollBit, Sphere AI, ScalePost, Created by Humans, ProRata, Miso.ai, and increasingly Big Tech firms like Cloudflare and Microsoft. As our report explains, these companies offer bot detection and blocking, content marketplaces with pay-per-use pricing, and attribution-based revenue distribution. The analytics and publisher control they offer are meaningful improvements over the opacity of one-on-one deals but carries structural vulnerabilities.

Most startups are venture-backed and therefore exposed to acquisition by the same large technology companies from which they nominally protect publishers (furthermore, their ability to attract capital is influenced by the lack of clarity with respect to the dozens of outstanding copyright lawsuits facing AI firms). I traced the contours of this risk when Cloudflare moved to block AI crawlers by default in mid-2025 and launched its pay-per-crawl marketplace, which was a significant shift that offered publishers more control while also raising serious questions about what it means for an infrastructural gatekeeper of Cloudflare’s scope and scale. The independent ad tech ecosystem went through something strikingly similar over the previous decade that resulted in Google’s illegal monopoly.

Third tier

The third tier is the long tail of media and content producers: local newspapers, regional broadcasters, ethnic and indigenous media, non-English language publishers, and specialized outlets whose loss would cause the greatest civic harm—not to mention individual journalists and creators. They are effectively absent from the AI licensing market entirely. This is a structural feature of how market power is distributed that fails to value how journalism’s civic value is actually distributed—much less its role in the accuracy, safety, and integrity of large language models (LLMs). The European Parliament study flags the same distributional risk in economic terms: Voluntary licensing leads to fragmented coverage and selective deals that produce biased and incomplete datasets, undermining both AI performance and overall welfare. A market that compensates only the publishers large enough to attract bilateral deal interest is not a market that will sustain a healthy information ecosystem.

The double bind

What makes the emerging market structure particularly corrosive is what I call the publisher double bind. The same Big Tech firms whose AI products are eroding website traffic are now building and controlling the licensing infrastructure those publishers must turn to. Traffic erosion pushes publishers toward licensing revenue. Licensing revenue increasingly runs through the corporations that caused the traffic erosion. Google and Microsoft occupy both ends of the value chain simultaneously, not through formal exclusivity arrangements that regulators could easily challenge (for example, via antitrust laws), but through standardization lock-in, data asymmetry, and the magnitude of platform scale (recall we are talking about trillion-dollar intermediaries in these cases).

This dynamic does not require bad intent to produce bad outcomes. It is simply how platform capture works and is a recurring pattern in the sector’s relationships with platforms. The Lords Committee put the strategic stakes plainly, observing that the continued drift toward tacit acceptance of large-scale, unlicensed use of creative content and long-term dependence on opaque models trained overseas is a “poor bet” that would sacrifice creative capacity for speculative AI gains expected to accrue largely to a few U.S.-based developers. This framing applies with equal force to publishers operating within any jurisdiction where dominant AI firms are not domestically based (i.e., most of the world).

The valuation problem runs deeper than traffic

A recurring problem in the media sector, and journalism in particular, is valuation. The compensation logic governing most negotiations rests on a narrow conception of value: referral traffic lost. But this framing is radically incomplete.

My previous research on AI and journalism valuation found that publishers’ contributions to AI extend across a much wider range of dimensions: training and fine-tuning; linguistic and reasoning capacity; factual grounding; temporal currency; and civic legitimacy. Accepting referral traffic as the governing benchmark ignores most of these entirely. The European Parliament study corroborates the incentive logic: When creators discover their works have been used for AI training without compensation, they tend to reduce output, risking a degradation in the quality and representativeness of future training data and ultimately harming the performance of AI systems themselves. This is the economic formalization of what we call “content cannibalization,” and it closes the loop on the leverage argument. AI companies have a direct interest in the economic sustainability of the content they depend on, yet the market is not currently structured or governed to reflect that interest.

Similarly, licensing for inference or what is known as retrieval-augmented generation (RAG) versus training data should not be considered separate markets. They are layers of the same value stack. The foundation model whose capacity to synthesize information and produce coherent prose was built with publisher content scraped without consent. Without that training foundation, the retrieval layer is useless. Pricing only the retrieval layer while treating the underlying model as a cost publishers have already donated for free is a category error that serves AI companies’ bottom line, not the media’s and much less the public interest.

There is also a legal dimension that has not received adequate attention in this debate, and to which we propose a revised way of interpreting in our paper. Copyright discourse around AI training has largely treated content-scraping as a discrete historical event. But with RAG, the model trained on publisher content is activated anew with every inference call. Publishers who accept current deal terms as final may be foreclosing substantially larger claims that courts have not yet adjudicated and locking in precedents that will be very difficult to revise once normalized and standardized.

What a different path requires

It should be crystal clear now that voluntary commitments, platform goodwill, and industry self-regulation have consistently failed to level the playing field. A system in which platform intermediaries control the infrastructure, information flows, and monetization systems while leaving publishers to gather up the shards of traffic and audience left behind is a broken system.

Several fixes are  achievable within existing legislative traditions, including statutory licensing frameworks with set rates, collective licensing and sectoral bargaining, mandatory transparency on deal terms and data usage, attribution systems at the model inference layer, and explicit inclusion requirements for local and independent media (akin to must carry requirements that are already present in many countries). Australia and Canada have demonstrated that bargaining code frameworks are legislatively viable. Music publishers and the industry more broadly has demonstrated that collective licensing works at scale. The architectural blueprints exist, and policymakers can work to facilitate a fairer business climate. What’s missing is the political will to apply it before the market structures calcify and the journalism industry further withers away.

Explicit consent from rights-holders must be a precondition for AI training data collection in the absence of a statutory framework. The Lords Committee has now given that position formal parliamentary backing with its (non-binding) recommendation that the government rule out any exceptions for AI model training on copyrighted works and focus instead on strengthening licensing, transparency, and enforcement. The European Parliament study goes further, recommending statutory licensing as the primary framework to ensure broad access to works with regulator-determined royalties balancing the interests of rightsholders, AI developers, and users, while maintaining incentives for ongoing creative output. An opt-out statutory system where individual publishers that did not want to take part could nonetheless pursue their own deals would be optimal, based on my assessment of the market.

Why the window matters

Publishers have continuously missed opportunities to value their products sustainably with search engines. They missed it again with social media. The difference this time is that the legal claims are more mature, the extraction is more visible and blatant, and the sector has had two decades to observe how prior platform relationships developed. Whether that accumulated experience translates into collective action and effective policy before market structures settle is unclear, but if it doesn’t, it will not be because of a lack of knowledge about what is happening. AI firms will lobby for voluntary fixes, which is how Big Tech has always promised to fix the problems it’s created.  

The deal structures, price precedents, intermediary take rates, and governance norms being established right now will be difficult to dislodge once they’re normalized. The window for intervention is narrowing. The terms, if policymakers don’t set them, will be set by the largest tech firms with the biggest budgets sufficient to withstand the increasing slew of lawsuits and bend Congress, parliaments, and regulators to their will.

  • Acknowledgements and disclosures

    Google and Microsoft are general, unrestricted donors to the Brookings Institution. The findings, interpretations, and conclusions posted in this piece are solely those of the authors and are not influenced by any donation.

The Brookings Institution is committed to quality, independence, and impact.
We are supported by a diverse array of funders. In line with our values and policies, each Brookings publication represents the sole views of its author(s).