Back to blog
Markov Tarpits: An Evolving Strategy Against AI Crawlers


Vytenis Kaubrė
Back to blog
Vytenis Kaubrė
AI web crawlers like GPTBot, ClaudeBot, Amazonbot, and others have become frequent visitors across the web. While gathering web content to power LLMs, they now represent a significant portion of website traffic—in one case, reaching nearly 70% of total web requests.
As a direct response from the community, some developers have recently revived the tarpit technology against AI web crawlers. However, the way it works raises a critical question: is it a sustainable defensive method or a solution that only harms both sides?
Markov tarpits aim to waste AI crawlers’ time by trapping them in endless pages of pointless data. Using Markov chain algorithms, these tarpits produce content that appears coherent but contains absolutely no useful information. Consider this example:
“The second danger, also, is only a dim monument, were too much to be driving to Elstead and, at the beginning," said the woman. ‘Thass funny. My name’s Smith too. Why,’ she added sentimentally, ‘I might be in the world is not only horrifying, but even.”
A fully deployed tarpit doesn't stop at a single paragraph—it creates an infinitely large website. Each page can contain multiple paragraphs of meaningless text interconnected with links to similar pages. Some implementations also introduce delayed loading times, increasing the AI crawler’s resource consumption.
As a result, web crawlers become trapped in collecting useless data that ultimately poisons their processes. Currently, a variety of Markov tarpit tools are freely accessible on the web, including:
Like any defensive technology, Markov tarpit effectiveness depends on proper implementation and consideration of broader impacts. Let's explore their potential successes alongside their possible failures.
Trapped crawlers: They trap and slow down AI crawlers, making large-scale web scraping prohibitively expensive.
Poisoned data: They feed fake and nonsensical data to AI systems, disrupting learning processes and ensuring inaccurate outputs.
Secured content: When properly implemented, they redirect crawlers away from legitimate pages, protecting actual content.
Fingerprinting: They provide analytical value by allowing investigation of traffic patterns, crawler IPs, and AI behavior.
Ethical concerns: They represent potentially malicious activity by deliberately consuming AI resources without creating value. Nepenthes' own creator describes it as "deliberately malicious software intended to cause harmful activity".
Harmed SEO: Search engines like Google may penalize websites implementing tarpits, potentially de-indexing them for spammy practices or reducing rankings due to thin content and deceptive linking structures.
Crawler blindness: They can inadvertently trap beneficial crawlers, as tools like Nepenthes cannot distinguish between search engine bots and AI training crawlers.
Server overload: They generate significant server resource consumption. Nepenthes' creator acknowledges hosts will experience continuous CPU load, ultimately affecting overall website performance.
Limited solution: Rather than solving the underlying issue of high traffic, tarpits merely redirect AI crawlers to other pages. This solution only offers a temporary fix, as AI web crawlers are likely to evolve and avoid such traps.
Markov tarpits provide an intriguing response to the growing friction between website owners and AI crawlers. In the short term, they may effectively slow down web crawlers—but their long-term consequences introduce significant questions. Rather than addressing underlying tensions, tarpits are risking collateral damage to beneficial crawlers, SEO rankings, and website resources.
While the frustration behind their adoption is understandable, Markov tarpits alone may not solve the deeper issues. A lasting solution likely involves clearer industry standards, transparent practices, and dialogue between website owners and those leveraging AI crawlers. In other words, a sustainable solution demands cooperation.
About the author
Vytenis Kaubrė
Technical Copywriter
Vytenis Kaubrė is a Technical Copywriter at Oxylabs. His love for creative writing and a growing interest in technology fuels his daily work, where he crafts technical content and web scrapers with Oxylabs’ solutions. Off duty, you might catch him working on personal projects, coding with Python, or jamming on his electric guitar.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Scale up your business with Oxylabs®