Social media platform Reddit stated on Tuesday it should replace a Internet normal utilized by the platform to dam automated information scraping from its web site, following experiences that AI startups had been bypassing the rule to collect content material for his or her techniques.
The transfer comes at a time when synthetic intelligence companies have been accused of plagiarizing content material from publishers to create AI-generated summaries with out giving credit score or asking for permission.
Reddit stated that it might replace the Robots Exclusion Protocol, or “robots.txt,” a broadly accepted normal meant to find out which elements of a web site are allowed to be crawled.
The corporate additionally stated it should keep rate-limiting, a method used to manage the variety of requests from one explicit entity, and can block unknown bots and crawlers from information scraping – gathering and saving uncooked info – on its web site.
Extra lately, robots.txt has change into a key software that publishers make use of to forestall tech firms from utilizing their content material free-of-charge to coach AI algorithms and create summaries in response to some search queries.
Final week, a letter to publishers by the content material licensing startup TollBit stated that a number of AI companies had been circumventing the net normal to scrape writer websites.
This follows a Wired investigation which discovered that AI search startup Perplexity seemingly bypassed efforts to dam its Internet crawler by way of robots.txt.
Earlier in June, enterprise media writer Forbes accused Perplexity of plagiarizing its investigative tales to be used in generative AI techniques with out giving credit score.
Reddit stated on Tuesday that researchers and organizations such because the Web Archive will proceed to have entry to its content material for non-commercial use.
© Thomson Reuters 2024