From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-bounces~public-inbox=simark.ca@sourceware.org>
Received: from simark.ca
	by simark.ca with LMTP
	id CVpyLcll8WbxhjQAWB0awg
	(envelope-from <gdb-bounces~public-inbox=simark.ca@sourceware.org>)
	for <public-inbox@simark.ca>; Mon, 23 Sep 2024 08:57:45 -0400
Authentication-Results: simark.ca;
	dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=gNFhusFU;
	dkim-atps=neutral
Received: by simark.ca (Postfix, from userid 112)
	id A72F21E353; Mon, 23 Sep 2024 08:57:45 -0400 (EDT)
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on simark.ca
X-Spam-Level: 
X-Spam-Status: No, score=-11.8 required=5.0 tests=ARC_SIGNED,ARC_VALID,
	BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI,RCVD_IN_VALIDITY_CERTIFIED,RCVD_IN_VALIDITY_RPBL,
	RCVD_IN_VALIDITY_SAFE,URIBL_BLOCKED,URIBL_DBL_BLOCKED_OPENDNS
	autolearn=ham autolearn_force=no version=4.0.0
Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256)
	(No client certificate requested)
	by simark.ca (Postfix) with ESMTPS id 9C5D21E05C
	for <public-inbox@simark.ca>; Mon, 23 Sep 2024 08:57:44 -0400 (EDT)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 2D3183858D35
	for <public-inbox@simark.ca>; Mon, 23 Sep 2024 12:57:44 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2D3183858D35
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1727096264;
	bh=nrrV9yQUPsK4EAp8GqSHBm5uANebIdVfr8R3AJowAD4=;
	h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
	 From;
	b=gNFhusFUyuKKKs1tlHxn7wrb16FqAH+w67YwcicbVB/wVqoO7enhpn3AheMyjGj0g
	 1jCSC2SKSqUVeq6YnsG9e4XObAXFqv8PINdeGfB1InomOx3QZ0VOi36UIJaPJ6rDg9
	 RhGEQAlm1OvKe/favZuSEZRZhUTAgsfYMyQbMlXs=
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 307B73858408;
 Mon, 23 Sep 2024 12:56:42 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 307B73858408
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 307B73858408
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1727096205; cv=none;
 b=Oy7Ug0hnk0PAmAq8M74QK4J3+TwfeSWB0wANHABla6gA0M2coQaN8Dg5IqXFfhBG2wG9f6ISJB0luC56WbP8y4SeosCkRfx+rJiP375cDBGZx+9RpTPqHukNlZADGKIzV/lTiwCG5VT+4og0ZCxEEen8cpyDhEg5k62zo34YZaM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1727096205; c=relaxed/simple;
 bh=/rV2h/T3grI+Jz6XUojNRpNPh3/QV1R4wLPvcBema1s=;
 h=Message-ID:Date:MIME-Version:Subject:To:From;
 b=knXfGGxgyTbFrmNK9kE36V6+8eyqGAAN+e8FXVPsAYvsR50X8eAa6gFnLmClNMorcAFlshlnR1Q5IzYTxI5FYrjiLFWTFXCyqHstHpqbcwC1B9DnGbNFlmyQYjbq/DtDIJ5F/l+UHLVGwqMVksuaagm6A3Np1HQ6jLLruw7880o=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4C751FEC;
 Mon, 23 Sep 2024 05:57:11 -0700 (PDT)
Received: from [10.57.84.228] (unknown [10.57.84.228])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 82ADF3F64C;
 Mon, 23 Sep 2024 05:56:40 -0700 (PDT)
Message-ID: <4cea35db-1005-4341-bfea-2e5d78b3399d@arm.com>
Date: Mon, 23 Sep 2024 13:56:38 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: On pull request workflows for the GNU toolchain
To: Joseph Myers <josmyers@redhat.com>, gcc@gcc.gnu.org,
 libc-alpha@sourceware.org, binutils@sourceware.org, gdb@sourceware.org
References: <b440cf28-674b-122a-160b-0fb96bbf989b@redhat.com>
Content-Language: en-GB
In-Reply-To: <b440cf28-674b-122a-160b-0fb96bbf989b@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: gdb@sourceware.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gdb mailing list <gdb.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb>,
 <mailto:gdb-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb>,
 <mailto:gdb-request@sourceware.org?subject=subscribe>
From: "Richard Earnshaw \(lists\) via Gdb" <gdb@sourceware.org>
Reply-To: "Richard Earnshaw \(lists\)" <Richard.Earnshaw@arm.com>
Errors-To: gdb-bounces~public-inbox=simark.ca@sourceware.org
Sender: "Gdb" <gdb-bounces~public-inbox=simark.ca@sourceware.org>

On 19/09/2024 16:51, Joseph Myers via Gcc wrote:
> 1. Introduction
> 
> This message expands on my remarks at the Cauldron (especially the
> patch review and maintenance BoF, and the Sourceware infrastructure
> BoF) regarding desired features for a system providing pull request
> functionality (patch submission via creating branches that are then
> proposed using some kind of web interface or API, with a central
> database then tracking the status of each pull request and review
> comments thereon automatically), for use by the GNU toolchain (or one
> or more components thereof - there is no need for each component to
> make the same decision about moving to such software and workflow, and
> indeed we have no mechanism to make such decisions for the toolchain
> as a whole).
> 
> This does not advocate a particular choice of software for such
> functionality (though much of the discussion seemed to suggest Forgejo
> as the most likely starting point), or a particular choice of where to
> host it.  Hosting would of course need to meet appropriate security
> requirements, and to achieve a passing grade on the GNU Ethical
> Repository Criteria, and the software would need to be entirely free
> software.  Where relevant features are not already supported, it's
> important that the software is receptive to the addition of such
> features (including cases where part of the functionality is provided
> by software specific to the GNU toolchain or parts thereof - such as
> for the custom checks currently implemented with git hooks - and the
> underlying software provides appropriate interfaces to allow
> integration of such external pieces).  The list of features here may
> be a good basis for reviewing what particular forge software supports
> and whether other features can be added, directly or through use of
> appropriate APIs.
> 
> Forge software may provide other pieces such as bug tracking or wikis
> that we currently handle separately from git hosting.  In such cases,
> we should be able to disable those pieces and keep using the existing
> bug tracking and wiki software (while having the option to decide
> independently to migrate those if desired).
> 
> I consider the overall benefits of such a move to be having more
> structured data about all changes proposed for inclusion and their
> status (needing review, needing changes from the author, under
> discussion, needing merge from mainline, etc.), to help all people
> involved in the patch submission and review process to track such
> information and to find patches needing review as applicable, along
> with providing a more familiar workflow for many people that avoids
> many of the problems with email (which affect experienced contributors
> working around corporate email systems, not just new contributors).
> It would not of course by itself turn people with no interest in or
> understanding of systems software development into contributors (for
> example, people without knowledge of directories and hierarchical file
> storage, or people who only understand software development as web
> development).  Nor would it prevent the accumulation of large backlogs
> of unreviewed patches, as is evident from many large and active
> projects using PR workflows with large numbers of open PRs.
> 
> As Richard noted in his BoF, email sucks.  As I noted in reply, so do
> the web and web browsers when trying to deal with large amounts of
> patch review state (when one wishes to apply one's own view, not the
> forge's, of what is resolved and what needs attention).  As I also
> noted, in the Sourceware infrastructure BoF, tools such as patchwork
> and b4 are the right answer to the wrong question: trying to get
> structured data about patch submissions when working from the axiom
> that emails on a mailing list should be the primary source of truth
> for everything needing review, rather than starting from more
> structured data and generating emails as one form of output.
> 
> Moving to a pull request system is not expected to change policies
> regarding who can approve a change for inclusion, or the technical
> limits on who can cause a change to enter mainline (e.g. all people
> with commit access would be expected to be able to use a button in a
> web interface to cause to PR to be merged, though policy might limit
> when they should do so).  We can of course choose to change policies,
> either as part of adopting a PR system or later.
> 
> 
> 2. Key features
> 
> (a) Some forges have a design that emphasises the tree you get after a
> proposed contribution, but not the sequence of commits to get there.
> For the toolchain, we care about the clean, logical sequence of
> commits on the mainline branch.  (We also use linear history on
> mainline, but that's a secondary matter - though certainly we'd want
> any forge used to support such linear history so that property doesn't
> need to change as part of adopting pull request workflow.)  Having a
> clean sequence of commits has some implications for forge support:
> 
> * Support for reviewing the proposed commit message, not just the
>    diffs, is important (and it should be clear what commit message
>    would result from merging any pull request).
> 
> * Patch series and dependencies between patches are important.  In
>    such cases, fixes for issues from review should go into the
>    appropriate logical commit in the PR (with rebasing as necessary),
>    and it should be possible at all times to see what the sequence of
>    commits on mainline would look like.  (E.g. people use some
>    workarounds on GitHub to manage PR dependencies, but those result in
>    second and subsequent PRs in a series showing the full set of diffs
>    from a PR and those it depends on, rather than just the logical
>    diffs for one PR.)
> 
>    I consider patch series and dependencies to be separate but related
>    things: a patch series may not have strictly linear dependencies
>    (and it's often useful to merge the more straightforward patches
>    from a series while still discussing and revising others), while a
>    patch may depend on other patches that it's not logically part of
>    the same series as.  They are, however, closely related, and a
>    sufficient solution for dependencies might also be adequate for many
>    cases of series.
> 
>    Note that series can sometimes have hundreds of patches; any
>    solution for patch series and dependencies needs to scale that far.
> 
>    There is of course the common case of a single-patch submission,
>    where the patch is ready for inclusion after some number of fixes.
>    In those cases, it's probably convenient if it's not necessary to
>    rebase - provided it's clear that a particular PR would be
>    squash-merged, and also what the commit message would be for the
>    final fixed commit.
> 
> * Given the need for rebasing when working with patch series, it's
>    important to have good support for rebasing.  In particular, all
>    revisions of the changes for a PR that was rebased need to remain
>    permanently available (e.g. through appropriate documented refs to
>    fetch to get all revisions of all PRs).
> 
> (b) Various people have built efficient workflows for going through
> all patch submissions and comments (or all in a particular area), even
> when only reviewing a small proportion, and have concerns about
> efficiency of a web interface when working with many patches and
> comments.  It's important to have good API support to allow people to
> build tools supporting their own workflow like this without needing to
> use the browser interface (and following their own logic, not the
> forge's, for what changes are of interest).  Good API support might,
> for example, include a straightforward way to get all changes to PR
> and comment data and metadata since some particular point, as well as
> for actions such as reviewing / commenting / approving a PR.  Such API
> support might be similar to what's needed to ensure people can readily
> get and maintain a local replica of all the key data and its history
> for all PRs.
> 
> Replication like that is also important for reliably ensuring key data
> remains available even if the forge software subsequently ceases to be
> maintained.  Consider, for example, the apparent disappearance of the
> data from www-gnats.gnu.org (we occasionally come across references to
> old bug reports from that system in glibc development, but don't have
> any way to resolve those references).
> 
> Another use of such an API would be to allow maintaining local copies
> of all PR comments etc. in a plain text form that can be searched with
> grep.
> 
> (c) Given that a transition would be from using mailing lists, it's
> important to have good plain text outward email notifications for all
> new merge requests, comments thereon and other significant actions
> (changing metadata, approvals / merges, etc.) - through whatever
> combination of built-in support in a forge and local implementation
> using the API.  Such notifications would go to the existing mailing
> lists (note that the choice of mailing lists for a patch can depend on
> which parts of the tree it changes - for example, the choice between
> binutils and gdb-patches lists, or some GCC patches going to the
> fortran or libstdc++ lists) - as well as to individuals concerned with
> / subscribed to a given PR.  Some very routine changes, such as
> reports of clean CI results, might be omitted by default from
> notifications sent to the mailing lists.
> 
> "good" includes quoting appropriate diff hunks being comments on.
> It's OK to have an HTML multipart as well, but the quality of the
> plain text part matters.  Diffs themselves should be included in
> new-PR emails (and those for changes to a PR) unless very large.
> 
> I do not however suggest trying to take incoming email (unstructured)
> and turn it into more structured comments on particular parts of a PR
> in the database.
> 
> Similarly, commit emails should continue to go to the existing mailing
> lists for those.
> 
> (d) Rather than the forge owning the mainline branch in the sense of
> every commit having to go through an approved PR, at least initially
> it should be possible for people to push directly to mainline as well,
> so the transition doesn't have to be instantaneous (and in particular,
> if a change was posted before the transition and approved after it, it
> shouldn't be necessary to turn it into a PR).  Longer term, I think it
> would be a good idea to move to everything going through the PR system
> (with people able to self-approve and merge their own PRs immediately
> where they can commit without review at present), so there is better
> structured uniform tracking of all changes and associated CI
> information etc. - however, direct commits to branches other than
> mainline (and maybe release branches) should continue to be OK long
> term (although it should also be possible to make a PR to such a
> branch if desired).
> 
> Beyond putting everything through PRs, eventually I'd hope to have
> merges to mainline normally all go through a CI system that makes sure
> there are no regressions for at least one configuration before merging
> changes.
> 
> (e) All existing pre-commit checks from hooks should be kept in some
> form, to maintain existing invariants on both tree and commit contents
> (some hook checks make sure that commits don't have commit messages
> that would cause other automated processes to fall over later, for
> example).
> 
> (f) Existing cron jobs that read from or commit to repositories should
> use normal remote git access, not filesystem access to the repository,
> including using APIs to create and self-merge PRs when pushing to the
> repository is involved.
> 

Thanks for such a comprehensive write-up Joseph.  I need to read through 
this with a lot more care, but not until my Covid fug has cleared more 
(ugh!).

One thing we must do, however, is break requirements into a number of 
categories: must haves (red lines, we can't transition without this); 
should haves (important, but we can likely find acceptable 
work-arounds); and would like (this would make things really nice, but 
they won't really block a transition).

For the first two categories we also need to think hard about WHY this 
is the case - that is, we should be able to state clearly a 
justification for the requirement.

A quick read of your email suggests you've identified a number of these 
issues, but it's not quite as clear on how hard each requirements needs 
to be, or perhaps why you think it has to be considered.

I don't expect the list to be entirely static either: issues might arise 
as we do more detailed investigation, but this list looks like a good 
starting point.

R.

PS: Forgejo can support issues in bugzilla (and you can teach it a 
regexp rule to create links from our component/id format into a bugzilla 
link); it can also support external wikis.  So if (still quite a big if) 
we went the forgejo route, those issues can be handled.