From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id Y5mKBESNB2i3uz8AWB0awg (envelope-from ) for ; Tue, 22 Apr 2025 08:36:20 -0400 Authentication-Results: simark.ca; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=GFdaIXey; dkim-atps=neutral Received: by simark.ca (Postfix, from userid 112) id F183B1E0C3; Tue, 22 Apr 2025 08:36:19 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-5.4 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham autolearn_force=no version=4.0.1 Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id A92F41E0C0 for ; Tue, 22 Apr 2025 08:36:18 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1DFB13857BA2 for ; Tue, 22 Apr 2025 12:36:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1DFB13857BA2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1745325378; bh=e3wTrHXIMmbvYlE0t24BFVFF2XBS/wh+sK0j7GjlBlc=; h=Date:Subject:To:Cc:References:In-Reply-To:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=GFdaIXeyV7DF87rQjPxK0nd80oELM1n8mJ8vzXhX56jdGbLziRm6+X3pDxVEk7Nib 4nMVXeAZoCcq9PbQuzLgzQD2cK2aNkNG48omxIVHDRISjLbbDuau9S0Xe2WohNPF7q W20CD/dQGrR7dfpMJGLvtfsgsG0+f1HXrt8416Ds= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 3300A3858C51 for ; Tue, 22 Apr 2025 12:35:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3300A3858C51 ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3300A3858C51 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1745325300; cv=none; b=ZBKYOn7ak3E3+UzkjGDxNhQYtkNM1qUKhsT4idtsXHY1U15OqqO2IaKvD4Em2DAVouOS6+4UQ3CsLwUYQbHZNC8W/pBsBl0SIrNekokc5OPlZoLZfdfDqGK3IcqVg18v6eVsT9FhtO3aOYyRJpxOMyxGwO4VTSxTueGZfwbtC2k= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1745325300; c=relaxed/simple; bh=m5t+vQF8Hiu2ism9Uhcjqe94ro9LSYi7ceXnYMSxiaU=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=HP2LTXGxOpVudw6dkYbWNL3c+chs6ZFuRtdCjSw8zHS+WQgnWMPSiIFWJ81pSdIU9muB1LrNy9tYT3SZkNF/Fd4ZmqZ0TchvKSLAnx2nIMLoJ/3cRmNwc3onrx3a1t+2+qQpi+8uK81IAH8oM3aFTIcNBUcHP/nPRkChuQffAr8= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3300A3858C51 Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-654-Mjig75qwMmaEINiHphA__g-1; Tue, 22 Apr 2025 08:34:58 -0400 X-MC-Unique: Mjig75qwMmaEINiHphA__g-1 X-Mimecast-MFC-AGG-ID: Mjig75qwMmaEINiHphA__g_1745325298 Received: by mail-pl1-f199.google.com with SMTP id d9443c01a7336-2254e0b4b85so45913705ad.0 for ; Tue, 22 Apr 2025 05:34:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745325297; x=1745930097; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=e3wTrHXIMmbvYlE0t24BFVFF2XBS/wh+sK0j7GjlBlc=; b=kfdavSjS0m3kjCKVIDiRacp1hMzrTD0sSB6qCrYayEXgcblsgTHzpSOzyH/fZ0IMza 3xF5XLqBd4hReid0EMEd35vgCh7WQaNfCjesx9i+07Cdl/a6CTCvyPlrQqP9ttDir08K ASN3Jm+ws7z9TC+bdrBQKQAM5u9dH7pCQ4/JJ7SS6m5SXco2kPbQ99VuOykiuD8IrqG9 FRmKd37zIgbnVB4SUfziB4S9rzbDRT4HiRGLPLYDD9fVJtyCCWlC93JEAMdDkA0ai4E9 MvYDkRqOY6A+3S9XRDl3SkAL6q4fiPF+SK14IGnPHFbZdJMaM3Wn568CFzjYpkHXXr/e JPAg== X-Forwarded-Encrypted: i=1; AJvYcCUpLNnApLYX28arwznP0gjGgTbVh1B1s8Qew4m9o4kW6C3scUv0SuK15VB+kI7l6EyvHdc=@sourceware.org X-Gm-Message-State: AOJu0YyCcGf3tAYEvalJWXW33lLcrOQ2ZO5zik2xajcwVLrAlphz2y2T P5FEX2gIZ/2U3RYf3/EXyNquGYDwwhRd3ffF/+q0xWPWQdr/sHOFpodMHWLJyw1hfSYkKqbPEv9 22wePWGhCylVgWPI1I6nVTB7Y/jNSBLLnPjQjDGLQeE6zVhgM X-Gm-Gg: ASbGnctxUtQmjM80MMcWYCMhTtQOy69Pv3IDAPZ63yXwNhsa2ZSTdxAdEqnh5o57UpO J72oViEZGUZUWm+0ekl+gevYJjyux98j9hzIY7uvk89pDMLnsVAJLeQpzwrP9gtw8Ku5tlrua6U 1VSkNFI5uSHzBV+rcXOdUjRbijOubBrQTGyNmNhcuCLWkEumEg8dUfB9AtZnHKBrxPuVBP5DeLU 5l7QagIzk5yAaGA7WrEKPRdmsyOgUAQhwCI8ZShFqVg/qnAMjP1urhmFBBgAh2CTMMGfJvoVJVx iutocTiVIGPe5k9x5h9N5b8= X-Received: by 2002:a17:903:3205:b0:221:1356:10c5 with SMTP id d9443c01a7336-22c50bf0412mr227056675ad.9.1745325297640; Tue, 22 Apr 2025 05:34:57 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH/nuCe6qIGJ5Dzko9yZ7GA4McPYqA0cpvwJmXeUObavCjwTBZWXHt3Gim++neGxosyxTeFsg== X-Received: by 2002:a17:903:3205:b0:221:1356:10c5 with SMTP id d9443c01a7336-22c50bf0412mr227056445ad.9.1745325297201; Tue, 22 Apr 2025 05:34:57 -0700 (PDT) Received: from ?IPV6:2804:14d:8084:9a69::1001? ([2804:14d:8084:9a69::1001]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22c50eb4897sm83326925ad.143.2025.04.22.05.34.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 22 Apr 2025 05:34:56 -0700 (PDT) Message-ID: <138abdfe-29db-45c8-a9e4-e7210e847ce7@redhat.com> Date: Tue, 22 Apr 2025 09:34:51 -0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: scraperbot protection - Patchwork and Bunsen behind Anubis To: Mark Wielaard , binutils@sourceware.org, elfutils-devel@sourceware.org, gcc@gcc.gnu.org, gdb@sourceware.org, libc-alpha@sourceware.org, libabigail@sourceware.org, newlib@sourceware.org Cc: overseers@sourceware.org References: <20250421155940.GE2323@gnu.wildebeest.org> In-Reply-To: <20250421155940.GE2323@gnu.wildebeest.org> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 3OOIYkpdzOxQHl_LNdBU_7AIGNaiwpGQX9N-_b1Wm7A_1745325298 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: gdb@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Guinevere Larsen via Gdb Reply-To: Guinevere Larsen Errors-To: gdb-bounces~public-inbox=simark.ca@sourceware.org Sender: "Gdb" On 4/21/25 12:59 PM, Mark Wielaard wrote: > Hi hackers, > > TLDR; When using https://patchwork.sourceware.org or Bunsen > https://builder.sourceware.org/testruns/ you might now have to enable > javascript. This should not impact any scripts, just browsers (or bots > pretending to be browsers). If it does cause trouble, please let us > know. If this works out we might also "protect" bugzilla, gitweb, > cgit, and the wikis this way. > > We don't like to hav to do this, but as some of you might have noticed > Sourceware has been fighting the new AI scraperbots since start of the > year. We are not alone in this. > > https://lwn.net/Articles/1008897/ > https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/ > > We have tried to isolate services more and block various ip-blocks > that were abusing the servers. But that has helped only so much. > Unfortunately the scraper bots are using lots of ip addresses > (probably by installing "free" VPN services that use normal user > connections as exit point) and pretending to be common > browsers/agents. We seem to have to make access to some services > depend on solving a javascript challenge. Jan Wildeboer, on the fediverse, has a pretty interesting lead on how AI scrapers might be doing this: https://social.wildeboer.net/@jwildeboer/114360486804175788 (this is the last post in the thread because it was hard to actually follow the thread given the number of replies, please go all the way up and read all 8 posts). Essentially, there's a library developer that pays developers to just "include this library and a few more lines in your TOS". This library then allows the app to sell the end-user's bandwidth to clients of the library developer, allowing them to make requests. This is how big companies are managing to have so many IP addresses, so many of those being residential IP addresses, and it also means that by blocking those IP addresses we will be - necessarily - blocking real user traffic to our platforms. I'm happy to see that the sourceware is moving to a more comprehensive solution, and if this is successful, I'd suggest that we also try to do that to the forgejo instance, and remove the IPs blocked because of this scraping. > > So we have installed Anubis https://anubis.techaro.lol/ in front of > patchwork and bunsen. This means that if you are using a browser that > identifies as Mozilla or Opera in their User-Agent you will get a > brief page showing the happy anime girl that requires javascript to > solve a challenge and get a cookie to get through. Scripts and search > engines should get through without. Also removing Mozilla and/or Opera > from your User-Agent will get you through without javascript. > > We want to thanks Xe Iaso who has helped us set this up and worked > with use over the Easter weekend solving some of our problems/typos. > Please check out if you want to be one of their patrons as thank you. > https://xeiaso.net/notes/2025/anubis-works/ > https://xeiaso.net/patrons/ > > Cheers, > > Mark > -- Cheers, Guinevere Larsen She/Her/Hers