From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id CVpyLcll8WbxhjQAWB0awg (envelope-from ) for ; Mon, 23 Sep 2024 08:57:45 -0400 Authentication-Results: simark.ca; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=gNFhusFU; dkim-atps=neutral Received: by simark.ca (Postfix, from userid 112) id A72F21E353; Mon, 23 Sep 2024 08:57:45 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-11.8 required=5.0 tests=ARC_SIGNED,ARC_VALID, BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,RCVD_IN_VALIDITY_CERTIFIED,RCVD_IN_VALIDITY_RPBL, RCVD_IN_VALIDITY_SAFE,URIBL_BLOCKED,URIBL_DBL_BLOCKED_OPENDNS autolearn=ham autolearn_force=no version=4.0.0 Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id 9C5D21E05C for ; Mon, 23 Sep 2024 08:57:44 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2D3183858D35 for ; Mon, 23 Sep 2024 12:57:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2D3183858D35 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1727096264; bh=nrrV9yQUPsK4EAp8GqSHBm5uANebIdVfr8R3AJowAD4=; h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=gNFhusFUyuKKKs1tlHxn7wrb16FqAH+w67YwcicbVB/wVqoO7enhpn3AheMyjGj0g 1jCSC2SKSqUVeq6YnsG9e4XObAXFqv8PINdeGfB1InomOx3QZ0VOi36UIJaPJ6rDg9 RhGEQAlm1OvKe/favZuSEZRZhUTAgsfYMyQbMlXs= Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 307B73858408; Mon, 23 Sep 2024 12:56:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 307B73858408 ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 307B73858408 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1727096205; cv=none; b=Oy7Ug0hnk0PAmAq8M74QK4J3+TwfeSWB0wANHABla6gA0M2coQaN8Dg5IqXFfhBG2wG9f6ISJB0luC56WbP8y4SeosCkRfx+rJiP375cDBGZx+9RpTPqHukNlZADGKIzV/lTiwCG5VT+4og0ZCxEEen8cpyDhEg5k62zo34YZaM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1727096205; c=relaxed/simple; bh=/rV2h/T3grI+Jz6XUojNRpNPh3/QV1R4wLPvcBema1s=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=knXfGGxgyTbFrmNK9kE36V6+8eyqGAAN+e8FXVPsAYvsR50X8eAa6gFnLmClNMorcAFlshlnR1Q5IzYTxI5FYrjiLFWTFXCyqHstHpqbcwC1B9DnGbNFlmyQYjbq/DtDIJ5F/l+UHLVGwqMVksuaagm6A3Np1HQ6jLLruw7880o= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4C751FEC; Mon, 23 Sep 2024 05:57:11 -0700 (PDT) Received: from [10.57.84.228] (unknown [10.57.84.228]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 82ADF3F64C; Mon, 23 Sep 2024 05:56:40 -0700 (PDT) Message-ID: <4cea35db-1005-4341-bfea-2e5d78b3399d@arm.com> Date: Mon, 23 Sep 2024 13:56:38 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: On pull request workflows for the GNU toolchain To: Joseph Myers , gcc@gcc.gnu.org, libc-alpha@sourceware.org, binutils@sourceware.org, gdb@sourceware.org References: Content-Language: en-GB In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: gdb@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: "Richard Earnshaw \(lists\) via Gdb" Reply-To: "Richard Earnshaw \(lists\)" Errors-To: gdb-bounces~public-inbox=simark.ca@sourceware.org Sender: "Gdb" On 19/09/2024 16:51, Joseph Myers via Gcc wrote: > 1. Introduction > > This message expands on my remarks at the Cauldron (especially the > patch review and maintenance BoF, and the Sourceware infrastructure > BoF) regarding desired features for a system providing pull request > functionality (patch submission via creating branches that are then > proposed using some kind of web interface or API, with a central > database then tracking the status of each pull request and review > comments thereon automatically), for use by the GNU toolchain (or one > or more components thereof - there is no need for each component to > make the same decision about moving to such software and workflow, and > indeed we have no mechanism to make such decisions for the toolchain > as a whole). > > This does not advocate a particular choice of software for such > functionality (though much of the discussion seemed to suggest Forgejo > as the most likely starting point), or a particular choice of where to > host it. Hosting would of course need to meet appropriate security > requirements, and to achieve a passing grade on the GNU Ethical > Repository Criteria, and the software would need to be entirely free > software. Where relevant features are not already supported, it's > important that the software is receptive to the addition of such > features (including cases where part of the functionality is provided > by software specific to the GNU toolchain or parts thereof - such as > for the custom checks currently implemented with git hooks - and the > underlying software provides appropriate interfaces to allow > integration of such external pieces). The list of features here may > be a good basis for reviewing what particular forge software supports > and whether other features can be added, directly or through use of > appropriate APIs. > > Forge software may provide other pieces such as bug tracking or wikis > that we currently handle separately from git hosting. In such cases, > we should be able to disable those pieces and keep using the existing > bug tracking and wiki software (while having the option to decide > independently to migrate those if desired). > > I consider the overall benefits of such a move to be having more > structured data about all changes proposed for inclusion and their > status (needing review, needing changes from the author, under > discussion, needing merge from mainline, etc.), to help all people > involved in the patch submission and review process to track such > information and to find patches needing review as applicable, along > with providing a more familiar workflow for many people that avoids > many of the problems with email (which affect experienced contributors > working around corporate email systems, not just new contributors). > It would not of course by itself turn people with no interest in or > understanding of systems software development into contributors (for > example, people without knowledge of directories and hierarchical file > storage, or people who only understand software development as web > development). Nor would it prevent the accumulation of large backlogs > of unreviewed patches, as is evident from many large and active > projects using PR workflows with large numbers of open PRs. > > As Richard noted in his BoF, email sucks. As I noted in reply, so do > the web and web browsers when trying to deal with large amounts of > patch review state (when one wishes to apply one's own view, not the > forge's, of what is resolved and what needs attention). As I also > noted, in the Sourceware infrastructure BoF, tools such as patchwork > and b4 are the right answer to the wrong question: trying to get > structured data about patch submissions when working from the axiom > that emails on a mailing list should be the primary source of truth > for everything needing review, rather than starting from more > structured data and generating emails as one form of output. > > Moving to a pull request system is not expected to change policies > regarding who can approve a change for inclusion, or the technical > limits on who can cause a change to enter mainline (e.g. all people > with commit access would be expected to be able to use a button in a > web interface to cause to PR to be merged, though policy might limit > when they should do so). We can of course choose to change policies, > either as part of adopting a PR system or later. > > > 2. Key features > > (a) Some forges have a design that emphasises the tree you get after a > proposed contribution, but not the sequence of commits to get there. > For the toolchain, we care about the clean, logical sequence of > commits on the mainline branch. (We also use linear history on > mainline, but that's a secondary matter - though certainly we'd want > any forge used to support such linear history so that property doesn't > need to change as part of adopting pull request workflow.) Having a > clean sequence of commits has some implications for forge support: > > * Support for reviewing the proposed commit message, not just the > diffs, is important (and it should be clear what commit message > would result from merging any pull request). > > * Patch series and dependencies between patches are important. In > such cases, fixes for issues from review should go into the > appropriate logical commit in the PR (with rebasing as necessary), > and it should be possible at all times to see what the sequence of > commits on mainline would look like. (E.g. people use some > workarounds on GitHub to manage PR dependencies, but those result in > second and subsequent PRs in a series showing the full set of diffs > from a PR and those it depends on, rather than just the logical > diffs for one PR.) > > I consider patch series and dependencies to be separate but related > things: a patch series may not have strictly linear dependencies > (and it's often useful to merge the more straightforward patches > from a series while still discussing and revising others), while a > patch may depend on other patches that it's not logically part of > the same series as. They are, however, closely related, and a > sufficient solution for dependencies might also be adequate for many > cases of series. > > Note that series can sometimes have hundreds of patches; any > solution for patch series and dependencies needs to scale that far. > > There is of course the common case of a single-patch submission, > where the patch is ready for inclusion after some number of fixes. > In those cases, it's probably convenient if it's not necessary to > rebase - provided it's clear that a particular PR would be > squash-merged, and also what the commit message would be for the > final fixed commit. > > * Given the need for rebasing when working with patch series, it's > important to have good support for rebasing. In particular, all > revisions of the changes for a PR that was rebased need to remain > permanently available (e.g. through appropriate documented refs to > fetch to get all revisions of all PRs). > > (b) Various people have built efficient workflows for going through > all patch submissions and comments (or all in a particular area), even > when only reviewing a small proportion, and have concerns about > efficiency of a web interface when working with many patches and > comments. It's important to have good API support to allow people to > build tools supporting their own workflow like this without needing to > use the browser interface (and following their own logic, not the > forge's, for what changes are of interest). Good API support might, > for example, include a straightforward way to get all changes to PR > and comment data and metadata since some particular point, as well as > for actions such as reviewing / commenting / approving a PR. Such API > support might be similar to what's needed to ensure people can readily > get and maintain a local replica of all the key data and its history > for all PRs. > > Replication like that is also important for reliably ensuring key data > remains available even if the forge software subsequently ceases to be > maintained. Consider, for example, the apparent disappearance of the > data from www-gnats.gnu.org (we occasionally come across references to > old bug reports from that system in glibc development, but don't have > any way to resolve those references). > > Another use of such an API would be to allow maintaining local copies > of all PR comments etc. in a plain text form that can be searched with > grep. > > (c) Given that a transition would be from using mailing lists, it's > important to have good plain text outward email notifications for all > new merge requests, comments thereon and other significant actions > (changing metadata, approvals / merges, etc.) - through whatever > combination of built-in support in a forge and local implementation > using the API. Such notifications would go to the existing mailing > lists (note that the choice of mailing lists for a patch can depend on > which parts of the tree it changes - for example, the choice between > binutils and gdb-patches lists, or some GCC patches going to the > fortran or libstdc++ lists) - as well as to individuals concerned with > / subscribed to a given PR. Some very routine changes, such as > reports of clean CI results, might be omitted by default from > notifications sent to the mailing lists. > > "good" includes quoting appropriate diff hunks being comments on. > It's OK to have an HTML multipart as well, but the quality of the > plain text part matters. Diffs themselves should be included in > new-PR emails (and those for changes to a PR) unless very large. > > I do not however suggest trying to take incoming email (unstructured) > and turn it into more structured comments on particular parts of a PR > in the database. > > Similarly, commit emails should continue to go to the existing mailing > lists for those. > > (d) Rather than the forge owning the mainline branch in the sense of > every commit having to go through an approved PR, at least initially > it should be possible for people to push directly to mainline as well, > so the transition doesn't have to be instantaneous (and in particular, > if a change was posted before the transition and approved after it, it > shouldn't be necessary to turn it into a PR). Longer term, I think it > would be a good idea to move to everything going through the PR system > (with people able to self-approve and merge their own PRs immediately > where they can commit without review at present), so there is better > structured uniform tracking of all changes and associated CI > information etc. - however, direct commits to branches other than > mainline (and maybe release branches) should continue to be OK long > term (although it should also be possible to make a PR to such a > branch if desired). > > Beyond putting everything through PRs, eventually I'd hope to have > merges to mainline normally all go through a CI system that makes sure > there are no regressions for at least one configuration before merging > changes. > > (e) All existing pre-commit checks from hooks should be kept in some > form, to maintain existing invariants on both tree and commit contents > (some hook checks make sure that commits don't have commit messages > that would cause other automated processes to fall over later, for > example). > > (f) Existing cron jobs that read from or commit to repositories should > use normal remote git access, not filesystem access to the repository, > including using APIs to create and self-merge PRs when pushing to the > repository is involved. > Thanks for such a comprehensive write-up Joseph. I need to read through this with a lot more care, but not until my Covid fug has cleared more (ugh!). One thing we must do, however, is break requirements into a number of categories: must haves (red lines, we can't transition without this); should haves (important, but we can likely find acceptable work-arounds); and would like (this would make things really nice, but they won't really block a transition). For the first two categories we also need to think hard about WHY this is the case - that is, we should be able to state clearly a justification for the requirement. A quick read of your email suggests you've identified a number of these issues, but it's not quite as clear on how hard each requirements needs to be, or perhaps why you think it has to be considered. I don't expect the list to be entirely static either: issues might arise as we do more detailed investigation, but this list looks like a good starting point. R. PS: Forgejo can support issues in bugzilla (and you can teach it a regexp rule to create links from our component/id format into a bugzilla link); it can also support external wikis. So if (still quite a big if) we went the forgejo route, those issues can be handled.