From 3fa17cc32566f405f23ce16bf273a4f8e240064e Mon Sep 17 00:00:00 2001 From: "Evan G." Date: Fri, 24 May 2024 17:40:39 -0500 Subject: [PATCH] Add robots.txt I copied https://codeberg.org/benjaminhollon/robots.txt-deny-llm/src/branch/main/robots.txt, or the robots.txt from benjaminhollon, to deny LLM's --- static/robots.txt | 71 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 70 insertions(+), 1 deletion(-) diff --git a/static/robots.txt b/static/robots.txt index 7d329b1..ce8ffcf 100644 --- a/static/robots.txt +++ b/static/robots.txt @@ -1 +1,70 @@ -User-agent: * +# from https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/ + +User-agent: CCBot +Disallow: / + +User-agent: ChatGPT-User +Disallow: / + +User-agent: GPTBot +Disallow: / + +User-agent: Google-Extended +Disallow: / + +User-agent: anthropic-ai +Disallow: / + +User-agent: Omgilibot +Disallow: / + +User-agent: Omgili +Disallow: / + +User-agent: FacebookBot +Disallow: / + +User-agent: Bytespider +Disallow: / + +# from https://github.com/healsdata/ai-training-opt-out + +# may not work, needs more research (see https://github.com/rom1504/img2dataset/issues/48) +User-agent: img2dataset +Disallow: / + +User-agent: Claude-Web +Disallow: / + +User-agent: magpie-crawler +Disallow: / + +# AhrefsBot crawls for data for an "SEO Dataset"—one of their "products" based on this dataset is "AI Writing Tools" +User-agent: AhrefsBot +Disallow: / + +# from https://www.cyberciti.biz/web-developer/block-openai-bard-bing-ai-crawler-bots-using-robots-txt-file/ +User-agent: PerplexityBot +Disallow: / + +# from https://netfuture.ch/2023/07/blocking-ai-crawlers-robots-txt-chatgpt/ +User-agent: cohere-ai +Disallow: / + +# from https://claytonerrington.com/blog/robots-and-ai/ + +User-agent: Amazonbot +Disallow: / + +# from https://darkvisitors.com/ + +User-agent: Applebot +Disallow: / + +User-agent: YouBot +Disallow: / + +# from https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler/ +User-agent: FriendlyCrawler +Disallow: / +