From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id z3vVEbuUB2jbwz8AWB0awg (envelope-from ) for ; Tue, 22 Apr 2025 09:08:11 -0400 Authentication-Results: simark.ca; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=QprvQHr6; dkim-atps=neutral Received: by simark.ca (Postfix, from userid 112) id 38B601E0C3; Tue, 22 Apr 2025 09:08:11 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-5.4 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham autolearn_force=no version=4.0.1 Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id 7B74F1E0C0 for ; Tue, 22 Apr 2025 09:08:10 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 05B363858424 for ; Tue, 22 Apr 2025 13:08:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 05B363858424 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1745327290; bh=AKJ1e5Y55CMMOW/1oSuKd6DW5sJ+SwwwxwB/F79/IJI=; h=References:In-Reply-To:Date:Subject:To:Cc:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=QprvQHr6IF7tbI/DnHEQHxAGYVp/+PiITUtWwCz6U2t3OHcgQ4y4ZrvAepooEDHnG /DWs0poaWFQrOpv7RUO/1/G1d95KpiSUK5CckpcNLHHKl/egHpWFpesE4xNOUsGwsa dA/5ToEVLYy5+ADcI/havTyjIWECLfLWlVq5/PkQ= Received: from mail-lj1-x232.google.com (mail-lj1-x232.google.com [IPv6:2a00:1450:4864:20::232]) by sourceware.org (Postfix) with ESMTPS id 148593857C4F; Tue, 22 Apr 2025 13:06:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 148593857C4F ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 148593857C4F ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1745327184; cv=none; b=mUXQwhKQAbL7z8bBYiY9+zYFoVYhWBs1M6P+GKJ5SU0dc8qfaTFVPIBBDD7XDv3r2/fnE2Csg6JYZxlwhmwhdQzyvohxxVHQS8tMIGLyn0R0Yw6ZxTwWGlDkT+9GCJ8bshUwZeNRmuNm6SNUgMuMLlTKJnP2dO128JjSUWNlNuE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1745327184; c=relaxed/simple; bh=P8aV877B+Xyowr2yYpaKc7oYTk7/MSHNWQ8xk7m7tXM=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=nQ5wXcW5Git0wegwpXVxCY70F91R5cqK/bUDK7U7M6U22di9JbxPORcg9cYQsBXbKZQR1TIr4bSMaioBw/0oyRc8gtC4KKt73vEtsEgOvI0RZZhUjSu9jlz+qXDNETayEeQlQRC9bxXi5Iuj7Bu/CvATMDHmx2HfNWyRToU0i7E= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 148593857C4F Received: by mail-lj1-x232.google.com with SMTP id 38308e7fff4ca-3105ef2a08dso45471621fa.0; Tue, 22 Apr 2025 06:06:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745327182; x=1745931982; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AKJ1e5Y55CMMOW/1oSuKd6DW5sJ+SwwwxwB/F79/IJI=; b=txFCuAcSaOwjqtBpdKrn4z3f4dgPhgsoGw3T3wW4fud5SZpoKVh23T6BsCzwHngAIF WA4RlHh3NFaeFCX50C6Av9HeYfHm81AKU+5QL6yiLkNn7Z9qhUWO2LUnHEorhH7+23Ic loPAt6w7n7qRBuAGcFiu2/zlIkJC3hprj+MzDrwyJymjSqheEnf525vQG1gjZ/4WEhJS dYVkjRg6iBt+gvHslYfJ0WVmWsvau/laIXN7cTMCLEfekByoT/NtftRVw0CEW10c+au7 4fXOLQHuJB19/Z3sCxDkemEDblu7/+JCcwJU217zAXd6H61lVohrNmC74/WZ073fDq+/ crpg== X-Forwarded-Encrypted: i=1; AJvYcCV9PMwMekuyF3P2SUmgHQv57Klzm/4lj/wg3Ofaco7Tv+DMKxn3iTObuqnx+oHz/j65ujR/jE3s3stgU61CPJs=@sourceware.org, AJvYcCVlZ2j4jCca0BdSsMg3dPnjTq/JsLEBQS80Zxwux4YQ9gbv00QAfXx0lEMQrMikGkQIdMJESh5s@sourceware.org, AJvYcCVzau9La7ZbCzLaxUi8MYvaptCHDfhMOGsbrsaYz34zj5WZPNntu+7vZm9cRbxRal7txTYxIL5RRPQ4Mw==@sourceware.org, AJvYcCW+dGadkz94mspE+OZIObXwMFxO6EpDbiwMI+MRODBsVe5xzxp7WHoRkb8aHFiz14CJMf+JKYQ46CAl@sourceware.org, AJvYcCXPFokonEZtih0Z3sKrl+nNknD5qpim6MSMhBW5HU9WPsgDps56J+9h3jPiAhFvFAdEOfc5Ie4lcxCFCg==@sourceware.org, AJvYcCXq5OyNNQjQSOyhaepKptGJPhZruceA+Rj7wb6YTu0/nfCQ8xqNXwVHGwakCIojl1wzXXnE@sourceware.org, AJvYcCXwBRsGVXp8JleqTUbB7YueCPhbV42eNNydZ767XFRkEegOZScJGIwtUV5M69w1VY3hLhhBr8ZKvQ==@sourceware.org X-Gm-Message-State: AOJu0YxnSpmyyhLjFaM8RZFkwjMfwcTYTI0sjyHS3YLqEKbE68wmktN6 6EBq2vleEr257hz5jJUZO671gnzapGQxzn0JjzrNzTSiSAejR6pve0NCB6/7nKYVNdPbzUE+1P3 fDuHNxC4AOTkAmbJDeuFvPy5UW4g= X-Gm-Gg: ASbGncud0XgKl1jWVsbuprzQe7PTCLeiXceaNjkxg3jb8Qv+ftZmqC+LgKoZ0inZsfo rPR/dhNdbBhitBTHQYdUdfWt/QXkLBe9pVCUasf7KJniLi+EfrA0nfVuzYkbxtwTX6sGN9t1QBO fkn5yM0y8XSx3i0mu3WNqyYw== X-Google-Smtp-Source: AGHT+IGxqyxLvVPMhIBPu9xWF9uvYMI8XbxK7eORs0jMVOJkzPneDH9tEYP+QAoNMx+OQbMeWlLCbLJoceZET5RoKE8= X-Received: by 2002:a2e:a109:0:b0:30b:cd68:b6a6 with SMTP id 38308e7fff4ca-310904ae133mr51365601fa.6.1745327180014; Tue, 22 Apr 2025 06:06:20 -0700 (PDT) MIME-Version: 1.0 References: <20250421155940.GE2323@gnu.wildebeest.org> <138abdfe-29db-45c8-a9e4-e7210e847ce7@redhat.com> In-Reply-To: <138abdfe-29db-45c8-a9e4-e7210e847ce7@redhat.com> Date: Tue, 22 Apr 2025 14:06:08 +0100 X-Gm-Features: ATxdqUEI4Vjfmakto_Hrg5_LPpYBmjjMaCYKWAfkChnKVrxOzc3EVXfcRcAgULA Message-ID: Subject: Re: scraperbot protection - Patchwork and Bunsen behind Anubis To: Guinevere Larsen Cc: Mark Wielaard , binutils@sourceware.org, elfutils-devel@sourceware.org, gcc@gcc.gnu.org, gdb@sourceware.org, libc-alpha@sourceware.org, libabigail@sourceware.org, newlib@sourceware.org, overseers@sourceware.org Content-Type: text/plain; charset="UTF-8" X-BeenThere: gdb@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Jonathan Wakely via Gdb Reply-To: Jonathan Wakely Errors-To: gdb-bounces~public-inbox=simark.ca@sourceware.org Sender: "Gdb" On Tue, 22 Apr 2025 at 13:36, Guinevere Larsen via Gcc wrote: > > On 4/21/25 12:59 PM, Mark Wielaard wrote: > > Hi hackers, > > > > TLDR; When using https://patchwork.sourceware.org or Bunsen > > https://builder.sourceware.org/testruns/ you might now have to enable > > javascript. This should not impact any scripts, just browsers (or bots > > pretending to be browsers). If it does cause trouble, please let us > > know. If this works out we might also "protect" bugzilla, gitweb, > > cgit, and the wikis this way. > > > > We don't like to hav to do this, but as some of you might have noticed > > Sourceware has been fighting the new AI scraperbots since start of the > > year. We are not alone in this. > > > > https://lwn.net/Articles/1008897/ > > https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/ > > > > We have tried to isolate services more and block various ip-blocks > > that were abusing the servers. But that has helped only so much. > > Unfortunately the scraper bots are using lots of ip addresses > > (probably by installing "free" VPN services that use normal user > > connections as exit point) and pretending to be common > > browsers/agents. We seem to have to make access to some services > > depend on solving a javascript challenge. > > Jan Wildeboer, on the fediverse, has a pretty interesting lead on how AI > scrapers might be doing this: > https://social.wildeboer.net/@jwildeboer/114360486804175788 (this is the > last post in the thread because it was hard to actually follow the > thread given the number of replies, please go all the way up and read > all 8 posts). > > Essentially, there's a library developer that pays developers to just > "include this library and a few more lines in your TOS". This library > then allows the app to sell the end-user's bandwidth to clients of the > library developer, allowing them to make requests. This is how big > companies are managing to have so many IP addresses, so many of those > being residential IP addresses, and it also means that by blocking those > IP addresses we will be - necessarily - blocking real user traffic to > our platforms. It seems to me that blocking real users *who are running these shady apps* is perfectly reasonable. They might not realise it, but those users are part of the problem. If we block them, maybe they'll be incentivised to stop using the shady apps. And if users stop using those apps, maybe those app developers will stop bundling the libraries that piggyback on users' bandwidth. > > I'm happy to see that the sourceware is moving to a more comprehensive > solution, and if this is successful, I'd suggest that we also try to do > that to the forgejo instance, and remove the IPs blocked because of this > scraping. For now, maybe. This thread already explained how to get around Anubis by changing the UserAgent string - how long will it be until these peer-to-business network libraries figure that out?