overview for habitualTartare

Is there a simple way to severly impede webscraping and LLM data collection of my website? in c/[email protected]

[–] [email protected] 10 points 1 week ago

https://en.wikipedia.org/wiki/Robots.txt

Should cover any polite web crawlers but it is voluntary.

https://platform.openai.com/docs/gptbot

Might have to put it behind a captcha or other type to severely limit automated access.

It's not realistic to assume it won't get scraped eventually. Such as someone paying people to bypass capatcha or web crawlers that don't respect robots.txt. I also don't know if Google and Microsoft bundle their AI data collection that doesn't also remove your site from web search.

US sets stage for antitrust probes into Microsoft, OpenAI and Nvidia in c/[email protected]

[–] [email protected] 2 points 3 weeks ago (1 children)

OpenAI may have grown a bit fast in relation to the ai hype craze so I don't know if it'll hold much water on an anti trust. Nvidia competes with competitors with things like GPUs but is pretty far ahead with ai chips and some data center ai related products. It will be interesting to see if either will go anywhere.

Can I remove part of the web a daddy long legs has woven without harming them? in c/[email protected]

[–] [email protected] 16 points 11 months ago (1 children)

I don't think it's an issue to remove part of the web. I've got spiders outside that tend to keep blocking the entryway and I have to destroy parts that get in the way of the path. They're usually back within a day or so.

For a more solid answer, the link below describes moving spiders entirely by relocating part of the web with the spider. I think trimming the web is going to be less stressful than that.

https://askentomologists.com/2015/10/11/how-do-i-relocate-insects-and-spiders/