From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id 1rn7HpgNCGieWgAAWB0awg (envelope-from ) for ; Tue, 22 Apr 2025 17:43:52 -0400 Authentication-Results: simark.ca; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=QJh3c6iP; dkim-atps=neutral Received: by simark.ca (Postfix, from userid 112) id 6F5471E0C3; Tue, 22 Apr 2025 17:43:52 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-5.4 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham autolearn_force=no version=4.0.1 Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id 3AD601E05C for ; Tue, 22 Apr 2025 17:43:51 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B64B43857C7B for ; Tue, 22 Apr 2025 21:43:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B64B43857C7B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1745358230; bh=2428YVyEnJzraLngkenAy8DSOS47JNxG1o1jUvH/Vvo=; h=Date:To:Cc:Subject:References:In-Reply-To:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=QJh3c6iPd0etwcnH/lXQMc6HtUUgHR+MxmyIPU5oL+/AjTZOkixGmHK+gmwpaQX/s CKdENRih3AihnEeiw7n+wwxrJNP69fYMjTo/hKPEk36znpYGDs8qEQnPtwmajp8cFK LJbWNNPFJk4FFGwG51qBQfXwwaZs+jvrt3lITeS0= Received: from hall.aurel32.net (hall.aurel32.net [IPv6:2001:bc8:30d7:100::1]) by sourceware.org (Postfix) with ESMTPS id 3B75E3858C2D; Tue, 22 Apr 2025 21:39:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3B75E3858C2D ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3B75E3858C2D ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1745357989; cv=none; b=P/WBy+1vv/SjEp1R5zOU/2FrCF7zq4IXkdnBgZvlIb3Lf9+pn/jiaj1GoYC709YtfXXO13ZOi0sW17SjIGhAQVDiuCRrAOmnWncjb4uCZkPxzfmco0ktvikzp9o2bgoJQwNN55CAVu19wSxnUwN4GwjiUyiWwuoPXhTCxXPnwNo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1745357989; c=relaxed/simple; bh=ut8+nb+0iJ1LQGAEvehjemRZEEAXPU06dZXq+IPscIE=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=aDp3nFHsyy3KcqEf7Lj/UyFwQBkSCrkvgAVC6eqGlj5Hft2KwC6OVNbuTywx1Nahx3IZNqqfoGVnbDBkj+UQKBOop46Qn6nxKjg2zZeOwDuCD9+ffcV0BbUDPhJeTOqrqgEve1dCw2P1hgxi4SVDDR/smnTDVXUqL17GbCSeckk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from [2a01:e34:ec5d:a741:1ee1:92ff:feb4:5ec0] (helo=ohm.rr44.fr) by hall.aurel32.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1u7LL7-007Rtg-2q; Tue, 22 Apr 2025 23:39:37 +0200 Date: Tue, 22 Apr 2025 23:39:36 +0200 To: Jonathan Wakely Cc: Guinevere Larsen , Mark Wielaard , binutils@sourceware.org, elfutils-devel@sourceware.org, gcc@gcc.gnu.org, gdb@sourceware.org, libc-alpha@sourceware.org, libabigail@sourceware.org, newlib@sourceware.org, overseers@sourceware.org Subject: Re: scraperbot protection - Patchwork and Bunsen behind Anubis Message-ID: Mail-Followup-To: Jonathan Wakely , Guinevere Larsen , Mark Wielaard , binutils@sourceware.org, elfutils-devel@sourceware.org, gcc@gcc.gnu.org, gdb@sourceware.org, libc-alpha@sourceware.org, libabigail@sourceware.org, newlib@sourceware.org, overseers@sourceware.org References: <20250421155940.GE2323@gnu.wildebeest.org> <138abdfe-29db-45c8-a9e4-e7210e847ce7@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.2.13 (2024-03-09) X-BeenThere: gdb@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Aurelien Jarno via Gdb Reply-To: Aurelien Jarno Errors-To: gdb-bounces~public-inbox=simark.ca@sourceware.org Sender: "Gdb" On 2025-04-22 14:06, Jonathan Wakely wrote: > On Tue, 22 Apr 2025 at 13:36, Guinevere Larsen via Gcc wrote: > > > > On 4/21/25 12:59 PM, Mark Wielaard wrote: > > > Hi hackers, > > > > > > TLDR; When using https://patchwork.sourceware.org or Bunsen > > > https://builder.sourceware.org/testruns/ you might now have to enable > > > javascript. This should not impact any scripts, just browsers (or bots > > > pretending to be browsers). If it does cause trouble, please let us > > > know. If this works out we might also "protect" bugzilla, gitweb, > > > cgit, and the wikis this way. > > > > > > We don't like to hav to do this, but as some of you might have noticed > > > Sourceware has been fighting the new AI scraperbots since start of the > > > year. We are not alone in this. > > > > > > https://lwn.net/Articles/1008897/ > > > https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/ > > > > > > We have tried to isolate services more and block various ip-blocks > > > that were abusing the servers. But that has helped only so much. > > > Unfortunately the scraper bots are using lots of ip addresses > > > (probably by installing "free" VPN services that use normal user > > > connections as exit point) and pretending to be common > > > browsers/agents. We seem to have to make access to some services > > > depend on solving a javascript challenge. > > > > Jan Wildeboer, on the fediverse, has a pretty interesting lead on how AI > > scrapers might be doing this: > > https://social.wildeboer.net/@jwildeboer/114360486804175788 (this is the > > last post in the thread because it was hard to actually follow the > > thread given the number of replies, please go all the way up and read > > all 8 posts). > > > > Essentially, there's a library developer that pays developers to just > > "include this library and a few more lines in your TOS". This library > > then allows the app to sell the end-user's bandwidth to clients of the > > library developer, allowing them to make requests. This is how big > > companies are managing to have so many IP addresses, so many of those > > being residential IP addresses, and it also means that by blocking those > > IP addresses we will be - necessarily - blocking real user traffic to > > our platforms. > > It seems to me that blocking real users *who are running these shady > apps* is perfectly reasonable. How do you detect them? From my experience at other hosting places, those IPs, just make a few request per hours or per day, with a standard User Agent. As such it's difficult to differentiate them from normal users. The problem is that you suddenly have hundreds of thousands of requests per hours from just a slightly lower number of IPs. And in the middle you also have legit users using IPs from the same net block. -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurelien@aurel32.net http://aurel32.net