From: Christophe Lyon via Gdb <gdb@sourceware.org>
To: Mark Wielaard <mark@klomp.org>
Cc: binutils@sourceware.org, elfutils-devel@sourceware.org,
gcc@gcc.gnu.org, gdb@sourceware.org, libc-alpha@sourceware.org,
libabigail@sourceware.org, newlib@sourceware.org,
overseers@sourceware.org
Subject: Re: scraperbot protection - Patchwork and Bunsen behind Anubis
Date: Wed, 23 Apr 2025 18:56:31 +0200 [thread overview]
Message-ID: <CAPS5khZh78WSrjokQe1o9C8M8qDFiYbxUur8yxnGnBAPTHHCzg@mail.gmail.com> (raw)
In-Reply-To: <20250421155940.GE2323@gnu.wildebeest.org>
Hi!
Thanks for all the hard work maintaining all this fundamental infrastructure.
On Mon, 21 Apr 2025 at 18:00, Mark Wielaard <mark@klomp.org> wrote:
>
> Hi hackers,
>
> TLDR; When using https://patchwork.sourceware.org or Bunsen
> https://builder.sourceware.org/testruns/ you might now have to enable
> javascript. This should not impact any scripts, just browsers (or bots
> pretending to be browsers). If it does cause trouble, please let us
> know. If this works out we might also "protect" bugzilla, gitweb,
> cgit, and the wikis this way.
>
> We don't like to hav to do this, but as some of you might have noticed
> Sourceware has been fighting the new AI scraperbots since start of the
> year. We are not alone in this.
>
> https://lwn.net/Articles/1008897/
> https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/
>
> We have tried to isolate services more and block various ip-blocks
> that were abusing the servers. But that has helped only so much.
In terms of isolation, since I have no idea how / which services are
currently isolated, I may ask obvious questions..
We were wondering if it would be possible / suitable to have https
requests served by one container,
and ssh ones by another? Maybe that's already the case though...
Speaking with CI in mind, the Linaro CI is currently severely impacted by these
scraperbots too:
- directly because our git servers are also overloaded, so our build
process often fail to checkout build scripts & infra scripts
- indirectly because when the above succeed we may fail to connect to sourceware
so maybe it would help if we switched to ssh access for our CI user
when cloning GCC / binuitls / etc sources?
If only the containers serving https requests are impacted, ssh access
could still work well?
(that would mean creating a Linaro-CI user on sourceware, I don't know
what the policy is?)
Thanks,
Christophe
> Unfortunately the scraper bots are using lots of ip addresses
> (probably by installing "free" VPN services that use normal user
> connections as exit point) and pretending to be common
> browsers/agents. We seem to have to make access to some services
> depend on solving a javascript challenge.
>
> So we have installed Anubis https://anubis.techaro.lol/ in front of
> patchwork and bunsen. This means that if you are using a browser that
> identifies as Mozilla or Opera in their User-Agent you will get a
> brief page showing the happy anime girl that requires javascript to
> solve a challenge and get a cookie to get through. Scripts and search
> engines should get through without. Also removing Mozilla and/or Opera
> from your User-Agent will get you through without javascript.
>
> We want to thanks Xe Iaso who has helped us set this up and worked
> with use over the Easter weekend solving some of our problems/typos.
> Please check out if you want to be one of their patrons as thank you.
> https://xeiaso.net/notes/2025/anubis-works/
> https://xeiaso.net/patrons/
>
> Cheers,
>
> Mark
next prev parent reply other threads:[~2025-04-23 16:57 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-21 15:59 Mark Wielaard
2025-04-22 12:34 ` Guinevere Larsen via Gdb
2025-04-22 13:06 ` Jonathan Wakely via Gdb
2025-04-22 13:17 ` Guinevere Larsen via Gdb
2025-04-22 14:44 ` Jonathan Wakely via Gdb
2025-04-22 21:39 ` Aurelien Jarno via Gdb
2025-04-23 3:52 ` Chris Packham via Gdb
2025-04-23 16:56 ` Christophe Lyon via Gdb [this message]
2025-04-23 17:49 ` Frank Ch. Eigler via Gdb
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAPS5khZh78WSrjokQe1o9C8M8qDFiYbxUur8yxnGnBAPTHHCzg@mail.gmail.com \
--to=gdb@sourceware.org \
--cc=binutils@sourceware.org \
--cc=christophe.lyon@linaro.org \
--cc=elfutils-devel@sourceware.org \
--cc=gcc@gcc.gnu.org \
--cc=libabigail@sourceware.org \
--cc=libc-alpha@sourceware.org \
--cc=mark@klomp.org \
--cc=newlib@sourceware.org \
--cc=overseers@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox