From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id mNGgMTKXB2gYxj8AWB0awg (envelope-from ) for ; Tue, 22 Apr 2025 09:18:42 -0400 Authentication-Results: simark.ca; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=EDUW+9vC; dkim-atps=neutral Received: by simark.ca (Postfix, from userid 112) id C74281E0C3; Tue, 22 Apr 2025 09:18:42 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-5.4 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham autolearn_force=no version=4.0.1 Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id 372551E0C0 for ; Tue, 22 Apr 2025 09:18:42 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CECCA3857C6E for ; Tue, 22 Apr 2025 13:18:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CECCA3857C6E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1745327921; bh=QMtGTt4e6f0rOAERalgCUOeSQ3biRizbIBh9QwM8HrM=; h=Date:Subject:To:Cc:References:In-Reply-To:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=EDUW+9vCNQ6DMvH8TPbGxn/rg6aWLUi+pfyE9m356eYS9SKI/nXmGEsJluBD5zN5J udaNd7siHRlyNT6FX/vIjq+9Zp6bMC/rdMLu01m7gz9mKzkH/34tt6xm0VnXCFJKGc OnAxeV8sXvOTtADsOI4nW9gxyuJ/Ag9Owh1syjEs= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 44A563858428 for ; Tue, 22 Apr 2025 13:17:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 44A563858428 ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 44A563858428 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1745327862; cv=none; b=sMpV2t0cuTNBrroQEjyb/36bFMKbgbFUXbW9YOHSL8IrTMEAM/I359sXLdydTinE40mNu9nI6Ghwu72EiUBw5zPhrEvzZQYTOx/L0jqaqO2C+BUZCCnYJ8ZmHF96YUcOY0t/N1ucULdCKGutuEcnD06bj+7AdWYClVNpTiqtsvw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1745327862; c=relaxed/simple; bh=GpQbl0VSXBEPUnkbF5EZTiL5cDi+m43BbvHXIf4qE10=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=WwWFDh4ArKDAG1xFWtt9+DBIELTsv+wTevefI+TgnkFeB06515BFlEOYJ6fJpY2OqSg3UK2mip+nGzSmjMahRWxlfrtFqnDYjd+15TzTL2FGfe8TuPrZrp3jvPCuMwKsAzmTKIds1PnvRYAHBmqXgw3GDc+Yi1GBpVSiiZO7YTQ= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 44A563858428 Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-596-QNsf4Ar9NN6-QsEnFqnm9g-1; Tue, 22 Apr 2025 09:17:41 -0400 X-MC-Unique: QNsf4Ar9NN6-QsEnFqnm9g-1 X-Mimecast-MFC-AGG-ID: QNsf4Ar9NN6-QsEnFqnm9g_1745327860 Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-6f0e2d30ab4so103547556d6.1 for ; Tue, 22 Apr 2025 06:17:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745327860; x=1745932660; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QMtGTt4e6f0rOAERalgCUOeSQ3biRizbIBh9QwM8HrM=; b=wW1jScajOnqC/YvdPUhGlwbSukZ418V0lMtwNg7QU9wuYLwbLzFqKMNBQbJgwfFOdp G4NI06P/ycjIrFAXylfiL8i0cgw2dSqIqvSDfoeOrxJCtq4jmN/ashja0cAf5/+4waVy rHZx2yb+nhvkMI6DeWy/pZ/HrmsdcOVbZOg1foWFEm9b8meBTmzY62XS6cmcsNL4Nkap ymbcckmd6Q1ZMbYX5SbzY0bLN/LHbnIyiymvIgt46v8QYGa8pqqBgWY3sj5wD4eG1QaE q9m7fTdkcWIkz6ajzxbzcGjQafs2b+E0nzIE9Pb1Eawq05uyKt9p4lgOIBViz5MpWyyz p1xw== X-Forwarded-Encrypted: i=1; AJvYcCUZsbY+L744dM+qGZNbjWSoGs1joQC1vxXU+XicqO2qrwCOGocP6U5W0YNUYtPgs4Qhy7U=@sourceware.org X-Gm-Message-State: AOJu0Yy5DkRM0BdwsGMlscr8XC8/dCcp1pn+4wBf2F5YkJbgJVOAr/+x k7/XmV5XpvCZQcoNWYgaGOZokM62jxLOwO5si3B2HkQ1LqIO9hZTrzzPRHLuvyAKaUaID//Z2na SeAE4EVBOT8Dme8DHDaXmF6Jm+SepLmLM+qaHU6LQOpBH3+Ij X-Gm-Gg: ASbGncvrsaGfqdITZojWYfDKNl3VyELHQ3XrDvwHkd+cf499HVbvGKBeGgTigejllmS QW98Mf+tbgbP8tuR+iLFKGj0X7jy6CAsX/rwnzQlqpSSYZflpS8X4BK0WyczGijhigI98l0PqQ7 StYdf3kjG8Ou9wIJa6kTH/PNKLn1yZKnKnhNLY45F1qaP4p+gE1O5Zm7IWrZZB00Q34u4DGi6Ls uha6Iya6lnW57IaFvd8bomOcI2plzuBBUUMTWq4gZ1bkRtRPHbjUZRDxVK8u5wpGiCN0CEWCrl5 LH0CIUBfjnwq2Cq73sZAuP4= X-Received: by 2002:a05:6214:5081:b0:6e8:9170:9d06 with SMTP id 6a1803df08f44-6f2c4687a66mr246997566d6.37.1745327860097; Tue, 22 Apr 2025 06:17:40 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHjGWuNVWG7LoVi62Ph0Kt+o3LXdDvQFzyqgjfVi+0Yrw3eoE1pk6mB8zkRcSQsxdiLUoOgDQ== X-Received: by 2002:a05:6214:5081:b0:6e8:9170:9d06 with SMTP id 6a1803df08f44-6f2c4687a66mr246997156d6.37.1745327859688; Tue, 22 Apr 2025 06:17:39 -0700 (PDT) Received: from ?IPV6:2804:14d:8084:9a69::1001? ([2804:14d:8084:9a69::1001]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6f2c2af179esm58164836d6.22.2025.04.22.06.17.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 22 Apr 2025 06:17:39 -0700 (PDT) Message-ID: <12bef5dd-dc0f-4b03-a1b2-03ca7942bc2b@redhat.com> Date: Tue, 22 Apr 2025 10:17:35 -0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: scraperbot protection - Patchwork and Bunsen behind Anubis To: Jonathan Wakely Cc: Mark Wielaard , binutils@sourceware.org, elfutils-devel@sourceware.org, gcc@gcc.gnu.org, gdb@sourceware.org, libc-alpha@sourceware.org, libabigail@sourceware.org, newlib@sourceware.org, overseers@sourceware.org References: <20250421155940.GE2323@gnu.wildebeest.org> <138abdfe-29db-45c8-a9e4-e7210e847ce7@redhat.com> In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: nfEoM6y5rO4pA4cyIUQSnNnUtaFsh6sfIKy6kmUVV9E_1745327860 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: gdb@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Guinevere Larsen via Gdb Reply-To: Guinevere Larsen Errors-To: gdb-bounces~public-inbox=simark.ca@sourceware.org Sender: "Gdb" On 4/22/25 10:06 AM, Jonathan Wakely wrote: > On Tue, 22 Apr 2025 at 13:36, Guinevere Larsen via Gcc wrote: >> On 4/21/25 12:59 PM, Mark Wielaard wrote: >>> Hi hackers, >>> >>> TLDR; When using https://patchwork.sourceware.org or Bunsen >>> https://builder.sourceware.org/testruns/ you might now have to enable >>> javascript. This should not impact any scripts, just browsers (or bots >>> pretending to be browsers). If it does cause trouble, please let us >>> know. If this works out we might also "protect" bugzilla, gitweb, >>> cgit, and the wikis this way. >>> >>> We don't like to hav to do this, but as some of you might have noticed >>> Sourceware has been fighting the new AI scraperbots since start of the >>> year. We are not alone in this. >>> >>> https://lwn.net/Articles/1008897/ >>> https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/ >>> >>> We have tried to isolate services more and block various ip-blocks >>> that were abusing the servers. But that has helped only so much. >>> Unfortunately the scraper bots are using lots of ip addresses >>> (probably by installing "free" VPN services that use normal user >>> connections as exit point) and pretending to be common >>> browsers/agents. We seem to have to make access to some services >>> depend on solving a javascript challenge. >> Jan Wildeboer, on the fediverse, has a pretty interesting lead on how AI >> scrapers might be doing this: >> https://social.wildeboer.net/@jwildeboer/114360486804175788 (this is the >> last post in the thread because it was hard to actually follow the >> thread given the number of replies, please go all the way up and read >> all 8 posts). >> >> Essentially, there's a library developer that pays developers to just >> "include this library and a few more lines in your TOS". This library >> then allows the app to sell the end-user's bandwidth to clients of the >> library developer, allowing them to make requests. This is how big >> companies are managing to have so many IP addresses, so many of those >> being residential IP addresses, and it also means that by blocking those >> IP addresses we will be - necessarily - blocking real user traffic to >> our platforms. > It seems to me that blocking real users *who are running these shady > apps* is perfectly reasonable. > > They might not realise it, but those users are part of the problem. If > we block them, maybe they'll be incentivised to stop using the shady > apps. And if users stop using those apps, maybe those app developers > will stop bundling the libraries that piggyback on users' bandwidth. If an IP mapped perfectly to one user, maybe. But I can't control what other users of the same ISP in the same area as me are doing, and we're sharing an IP. And worse, if I still lived with my family, no way would I be able to veto what my parents are using their phone for, so because they have a shady app I wouldn't be able to access systems? that doesn't seem fair at all Not to mention the fact that "read and understand the entirety of the TOS of every single app" assumes a pretty decent amount of free time for users that they may not have, and we wouldn't want to make open source even more hostile for people who are overwhelmed or overworked already. Of course people should, but having that as a requirement excludes people like... well, myself to be quite honest. > >> I'm happy to see that the sourceware is moving to a more comprehensive >> solution, and if this is successful, I'd suggest that we also try to do >> that to the forgejo instance, and remove the IPs blocked because of this >> scraping. > For now, maybe. This thread already explained how to get around Anubis > by changing the UserAgent string - how long will it be until these > peer-to-business network libraries figure that out? > hopefully longer than the bubble lasts -- Cheers, Guinevere Larsen She/Her/Hers