On pull request workflows for the GNU toolchain

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

* On pull request workflows for the GNU toolchain
@ 2024-09-19 15:51 Joseph Myers via Gdb
  2024-09-20 18:25 ` Carlos O'Donell via Gdb
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Joseph Myers via Gdb @ 2024-09-19 15:51 UTC (permalink / raw)
  To: gcc, libc-alpha, binutils, gdb

1. Introduction

This message expands on my remarks at the Cauldron (especially the
patch review and maintenance BoF, and the Sourceware infrastructure
BoF) regarding desired features for a system providing pull request
functionality (patch submission via creating branches that are then
proposed using some kind of web interface or API, with a central
database then tracking the status of each pull request and review
comments thereon automatically), for use by the GNU toolchain (or one
or more components thereof - there is no need for each component to
make the same decision about moving to such software and workflow, and
indeed we have no mechanism to make such decisions for the toolchain
as a whole).

This does not advocate a particular choice of software for such
functionality (though much of the discussion seemed to suggest Forgejo
as the most likely starting point), or a particular choice of where to
host it.  Hosting would of course need to meet appropriate security
requirements, and to achieve a passing grade on the GNU Ethical
Repository Criteria, and the software would need to be entirely free
software.  Where relevant features are not already supported, it's
important that the software is receptive to the addition of such
features (including cases where part of the functionality is provided
by software specific to the GNU toolchain or parts thereof - such as
for the custom checks currently implemented with git hooks - and the
underlying software provides appropriate interfaces to allow
integration of such external pieces).  The list of features here may
be a good basis for reviewing what particular forge software supports
and whether other features can be added, directly or through use of
appropriate APIs.

Forge software may provide other pieces such as bug tracking or wikis
that we currently handle separately from git hosting.  In such cases,
we should be able to disable those pieces and keep using the existing
bug tracking and wiki software (while having the option to decide
independently to migrate those if desired).

I consider the overall benefits of such a move to be having more
structured data about all changes proposed for inclusion and their
status (needing review, needing changes from the author, under
discussion, needing merge from mainline, etc.), to help all people
involved in the patch submission and review process to track such
information and to find patches needing review as applicable, along
with providing a more familiar workflow for many people that avoids
many of the problems with email (which affect experienced contributors
working around corporate email systems, not just new contributors).
It would not of course by itself turn people with no interest in or
understanding of systems software development into contributors (for
example, people without knowledge of directories and hierarchical file
storage, or people who only understand software development as web
development).  Nor would it prevent the accumulation of large backlogs
of unreviewed patches, as is evident from many large and active
projects using PR workflows with large numbers of open PRs.

As Richard noted in his BoF, email sucks.  As I noted in reply, so do
the web and web browsers when trying to deal with large amounts of
patch review state (when one wishes to apply one's own view, not the
forge's, of what is resolved and what needs attention).  As I also
noted, in the Sourceware infrastructure BoF, tools such as patchwork
and b4 are the right answer to the wrong question: trying to get
structured data about patch submissions when working from the axiom
that emails on a mailing list should be the primary source of truth
for everything needing review, rather than starting from more
structured data and generating emails as one form of output.

Moving to a pull request system is not expected to change policies
regarding who can approve a change for inclusion, or the technical
limits on who can cause a change to enter mainline (e.g. all people
with commit access would be expected to be able to use a button in a
web interface to cause to PR to be merged, though policy might limit
when they should do so).  We can of course choose to change policies,
either as part of adopting a PR system or later.

2. Key features

(a) Some forges have a design that emphasises the tree you get after a
proposed contribution, but not the sequence of commits to get there.
For the toolchain, we care about the clean, logical sequence of
commits on the mainline branch.  (We also use linear history on
mainline, but that's a secondary matter - though certainly we'd want
any forge used to support such linear history so that property doesn't
need to change as part of adopting pull request workflow.)  Having a
clean sequence of commits has some implications for forge support:

* Support for reviewing the proposed commit message, not just the
  diffs, is important (and it should be clear what commit message
  would result from merging any pull request).

* Patch series and dependencies between patches are important.  In
  such cases, fixes for issues from review should go into the
  appropriate logical commit in the PR (with rebasing as necessary),
  and it should be possible at all times to see what the sequence of
  commits on mainline would look like.  (E.g. people use some
  workarounds on GitHub to manage PR dependencies, but those result in
  second and subsequent PRs in a series showing the full set of diffs
  from a PR and those it depends on, rather than just the logical
  diffs for one PR.)

  I consider patch series and dependencies to be separate but related
  things: a patch series may not have strictly linear dependencies
  (and it's often useful to merge the more straightforward patches
  from a series while still discussing and revising others), while a
  patch may depend on other patches that it's not logically part of
  the same series as.  They are, however, closely related, and a
  sufficient solution for dependencies might also be adequate for many
  cases of series.

  Note that series can sometimes have hundreds of patches; any
  solution for patch series and dependencies needs to scale that far.

  There is of course the common case of a single-patch submission,
  where the patch is ready for inclusion after some number of fixes.
  In those cases, it's probably convenient if it's not necessary to
  rebase - provided it's clear that a particular PR would be
  squash-merged, and also what the commit message would be for the
  final fixed commit.

* Given the need for rebasing when working with patch series, it's
  important to have good support for rebasing.  In particular, all
  revisions of the changes for a PR that was rebased need to remain
  permanently available (e.g. through appropriate documented refs to
  fetch to get all revisions of all PRs).

(b) Various people have built efficient workflows for going through
all patch submissions and comments (or all in a particular area), even
when only reviewing a small proportion, and have concerns about
efficiency of a web interface when working with many patches and
comments.  It's important to have good API support to allow people to
build tools supporting their own workflow like this without needing to
use the browser interface (and following their own logic, not the
forge's, for what changes are of interest).  Good API support might,
for example, include a straightforward way to get all changes to PR
and comment data and metadata since some particular point, as well as
for actions such as reviewing / commenting / approving a PR.  Such API
support might be similar to what's needed to ensure people can readily
get and maintain a local replica of all the key data and its history
for all PRs.

Replication like that is also important for reliably ensuring key data
remains available even if the forge software subsequently ceases to be
maintained.  Consider, for example, the apparent disappearance of the
data from www-gnats.gnu.org (we occasionally come across references to
old bug reports from that system in glibc development, but don't have
any way to resolve those references).

Another use of such an API would be to allow maintaining local copies
of all PR comments etc. in a plain text form that can be searched with
grep.

(c) Given that a transition would be from using mailing lists, it's
important to have good plain text outward email notifications for all
new merge requests, comments thereon and other significant actions
(changing metadata, approvals / merges, etc.) - through whatever
combination of built-in support in a forge and local implementation
using the API.  Such notifications would go to the existing mailing
lists (note that the choice of mailing lists for a patch can depend on
which parts of the tree it changes - for example, the choice between
binutils and gdb-patches lists, or some GCC patches going to the
fortran or libstdc++ lists) - as well as to individuals concerned with
/ subscribed to a given PR.  Some very routine changes, such as
reports of clean CI results, might be omitted by default from
notifications sent to the mailing lists.

"good" includes quoting appropriate diff hunks being comments on.
It's OK to have an HTML multipart as well, but the quality of the
plain text part matters.  Diffs themselves should be included in
new-PR emails (and those for changes to a PR) unless very large.

I do not however suggest trying to take incoming email (unstructured)
and turn it into more structured comments on particular parts of a PR
in the database.

Similarly, commit emails should continue to go to the existing mailing
lists for those.

(d) Rather than the forge owning the mainline branch in the sense of
every commit having to go through an approved PR, at least initially
it should be possible for people to push directly to mainline as well,
so the transition doesn't have to be instantaneous (and in particular,
if a change was posted before the transition and approved after it, it
shouldn't be necessary to turn it into a PR).  Longer term, I think it
would be a good idea to move to everything going through the PR system
(with people able to self-approve and merge their own PRs immediately
where they can commit without review at present), so there is better
structured uniform tracking of all changes and associated CI
information etc. - however, direct commits to branches other than
mainline (and maybe release branches) should continue to be OK long
term (although it should also be possible to make a PR to such a
branch if desired).

Beyond putting everything through PRs, eventually I'd hope to have
merges to mainline normally all go through a CI system that makes sure
there are no regressions for at least one configuration before merging
changes.

(e) All existing pre-commit checks from hooks should be kept in some
form, to maintain existing invariants on both tree and commit contents
(some hook checks make sure that commits don't have commit messages
that would cause other automated processes to fall over later, for
example).

(f) Existing cron jobs that read from or commit to repositories should
use normal remote git access, not filesystem access to the repository,
including using APIs to create and self-merge PRs when pushing to the
repository is involved.

-- 
Joseph S. Myers
josmyers@redhat.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-19 15:51 On pull request workflows for the GNU toolchain Joseph Myers via Gdb
@ 2024-09-20 18:25 ` Carlos O'Donell via Gdb
  2024-09-20 20:41   ` Joseph Myers via Gdb
  2024-09-20 21:43   ` Sam James via Gdb
  2024-09-20 19:58 ` Matt Rice via Gdb
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 25+ messages in thread
From: Carlos O'Donell via Gdb @ 2024-09-20 18:25 UTC (permalink / raw)
  To: Joseph Myers, gcc, libc-alpha, binutils, gdb

On 9/19/24 11:51 AM, Joseph Myers wrote:
> 1. Introduction

Thanks for writing this up!

> This message expands on my remarks at the Cauldron (especially the
> patch review and maintenance BoF, and the Sourceware infrastructure
> BoF) regarding desired features for a system providing pull request
> functionality (patch submission via creating branches that are then
> proposed using some kind of web interface or API, with a central
> database then tracking the status of each pull request and review
> comments thereon automatically), for use by the GNU toolchain (or one
> or more components thereof - there is no need for each component to
> make the same decision about moving to such software and workflow, and
> indeed we have no mechanism to make such decisions for the toolchain
> as a whole).
> 
> This does not advocate a particular choice of software for such
> functionality (though much of the discussion seemed to suggest Forgejo
> as the most likely starting point), or a particular choice of where to
> host it.  Hosting would of course need to meet appropriate security
> requirements, and to achieve a passing grade on the GNU Ethical
> Repository Criteria, and the software would need to be entirely free
> software.  Where relevant features are not already supported, it's
> important that the software is receptive to the addition of such
> features (including cases where part of the functionality is provided
> by software specific to the GNU toolchain or parts thereof - such as
> for the custom checks currently implemented with git hooks - and the
> underlying software provides appropriate interfaces to allow
> integration of such external pieces).  The list of features here may
> be a good basis for reviewing what particular forge software supports
> and whether other features can be added, directly or through use of
> appropriate APIs.
> 
> Forge software may provide other pieces such as bug tracking or wikis
> that we currently handle separately from git hosting.  In such cases,
> we should be able to disable those pieces and keep using the existing
> bug tracking and wiki software (while having the option to decide
> independently to migrate those if desired).
> 
> I consider the overall benefits of such a move to be having more
> structured data about all changes proposed for inclusion and their
> status (needing review, needing changes from the author, under
> discussion, needing merge from mainline, etc.), to help all people
> involved in the patch submission and review process to track such
> information and to find patches needing review as applicable, along
> with providing a more familiar workflow for many people that avoids
> many of the problems with email (which affect experienced contributors
> working around corporate email systems, not just new contributors).
> It would not of course by itself turn people with no interest in or
> understanding of systems software development into contributors (for
> example, people without knowledge of directories and hierarchical file
> storage, or people who only understand software development as web
> development).  Nor would it prevent the accumulation of large backlogs
> of unreviewed patches, as is evident from many large and active
> projects using PR workflows with large numbers of open PRs.
> 
> As Richard noted in his BoF, email sucks.  As I noted in reply, so do
> the web and web browsers when trying to deal with large amounts of
> patch review state (when one wishes to apply one's own view, not the
> forge's, of what is resolved and what needs attention).  As I also
> noted, in the Sourceware infrastructure BoF, tools such as patchwork
> and b4 are the right answer to the wrong question: trying to get
> structured data about patch submissions when working from the axiom
> that emails on a mailing list should be the primary source of truth
> for everything needing review, rather than starting from more
> structured data and generating emails as one form of output.
> 
> Moving to a pull request system is not expected to change policies
> regarding who can approve a change for inclusion, or the technical
> limits on who can cause a change to enter mainline (e.g. all people
> with commit access would be expected to be able to use a button in a
> web interface to cause to PR to be merged, though policy might limit
> when they should do so).  We can of course choose to change policies,
> either as part of adopting a PR system or later.

Agreed.

> 
> 
> 2. Key features
> 
> (a) Some forges have a design that emphasises the tree you get after a
> proposed contribution, but not the sequence of commits to get there.

This has changed a lot in recent years in gitlab and github where per-commit reviews
were emerging as a needed property of the web UI, so the commits (rather than the
final state of the tree) were being exposed for review purposes. I see this as a
general maturing of the tooling towards what we already knew in systems design,
that the commits and the clean logical changes were a necessary prerequisite to
managing the complexity of the change.

> For the toolchain, we care about the clean, logical sequence of
> commits on the mainline branch.  (We also use linear history on
> mainline, but that's a secondary matter - though certainly we'd want
> any forge used to support such linear history so that property doesn't
> need to change as part of adopting pull request workflow.)  Having a
> clean sequence of commits has some implications for forge support:
> 
> * Support for reviewing the proposed commit message, not just the
>   diffs, is important (and it should be clear what commit message
>   would result from merging any pull request).
> 
> * Patch series and dependencies between patches are important.  In
>   such cases, fixes for issues from review should go into the
>   appropriate logical commit in the PR (with rebasing as necessary),
>   and it should be possible at all times to see what the sequence of
>   commits on mainline would look like.  (E.g. people use some
>   workarounds on GitHub to manage PR dependencies, but those result in
>   second and subsequent PRs in a series showing the full set of diffs
>   from a PR and those it depends on, rather than just the logical
>   diffs for one PR.)

This area has seen a lot of change recently in the large forges, and warrants a
review of current offerings.

>   I consider patch series and dependencies to be separate but related
>   things: a patch series may not have strictly linear dependencies
>   (and it's often useful to merge the more straightforward patches
>   from a series while still discussing and revising others), while a
>   patch may depend on other patches that it's not logically part of
>   the same series as.  They are, however, closely related, and a
>   sufficient solution for dependencies might also be adequate for many
>   cases of series.

Agreed.

>   Note that series can sometimes have hundreds of patches; any
>   solution for patch series and dependencies needs to scale that far.
> 
>   There is of course the common case of a single-patch submission,
>   where the patch is ready for inclusion after some number of fixes.
>   In those cases, it's probably convenient if it's not necessary to
>   rebase - provided it's clear that a particular PR would be
>   squash-merged, and also what the commit message would be for the
>   final fixed commit.
> 
> * Given the need for rebasing when working with patch series, it's
>   important to have good support for rebasing.  In particular, all
>   revisions of the changes for a PR that was rebased need to remain
>   permanently available (e.g. through appropriate documented refs to
>   fetch to get all revisions of all PRs).
> 
> (b) Various people have built efficient workflows for going through
> all patch submissions and comments (or all in a particular area), even
> when only reviewing a small proportion, and have concerns about
> efficiency of a web interface when working with many patches and
> comments.  It's important to have good API support to allow people to
> build tools supporting their own workflow like this without needing to
> use the browser interface (and following their own logic, not the
> forge's, for what changes are of interest).  Good API support might,
> for example, include a straightforward way to get all changes to PR
> and comment data and metadata since some particular point, as well as
> for actions such as reviewing / commenting / approving a PR.  Such API
> support might be similar to what's needed to ensure people can readily
> get and maintain a local replica of all the key data and its history
> for all PRs.

Agreed.

> Replication like that is also important for reliably ensuring key data
> remains available even if the forge software subsequently ceases to be
> maintained.  Consider, for example, the apparent disappearance of the
> data from www-gnats.gnu.org (we occasionally come across references to
> old bug reports from that system in glibc development, but don't have
> any way to resolve those references).
> 
> Another use of such an API would be to allow maintaining local copies
> of all PR comments etc. in a plain text form that can be searched with
> grep.
> 
> (c) Given that a transition would be from using mailing lists, it's
> important to have good plain text outward email notifications for all
> new merge requests, comments thereon and other significant actions
> (changing metadata, approvals / merges, etc.) - through whatever
> combination of built-in support in a forge and local implementation
> using the API.  Such notifications would go to the existing mailing
> lists (note that the choice of mailing lists for a patch can depend on
> which parts of the tree it changes - for example, the choice between
> binutils and gdb-patches lists, or some GCC patches going to the
> fortran or libstdc++ lists) - as well as to individuals concerned with
> / subscribed to a given PR.  Some very routine changes, such as
> reports of clean CI results, might be omitted by default from
> notifications sent to the mailing lists.
> 
> "good" includes quoting appropriate diff hunks being comments on.
> It's OK to have an HTML multipart as well, but the quality of the
> plain text part matters.  Diffs themselves should be included in
> new-PR emails (and those for changes to a PR) unless very large.
> 
> I do not however suggest trying to take incoming email (unstructured)
> and turn it into more structured comments on particular parts of a PR
> in the database.

Agreed Re: incoming email.

> Similarly, commit emails should continue to go to the existing mailing
> lists for those.
> 
> (d) Rather than the forge owning the mainline branch in the sense of
> every commit having to go through an approved PR, at least initially
> it should be possible for people to push directly to mainline as well,
> so the transition doesn't have to be instantaneous (and in particular,
> if a change was posted before the transition and approved after it, it
> shouldn't be necessary to turn it into a PR).  Longer term, I think it
> would be a good idea to move to everything going through the PR system
> (with people able to self-approve and merge their own PRs immediately
> where they can commit without review at present), so there is better
> structured uniform tracking of all changes and associated CI
> information etc. - however, direct commits to branches other than
> mainline (and maybe release branches) should continue to be OK long
> term (although it should also be possible to make a PR to such a
> branch if desired).
> 
> Beyond putting everything through PRs, eventually I'd hope to have
> merges to mainline normally all go through a CI system that makes sure
> there are no regressions for at least one configuration before merging
> changes.

Agreed.
 
> (e) All existing pre-commit checks from hooks should be kept in some
> form, to maintain existing invariants on both tree and commit contents
> (some hook checks make sure that commits don't have commit messages
> that would cause other automated processes to fall over later, for
> example).

These could all move to pre-commit CI checks that block merging.

> (f) Existing cron jobs that read from or commit to repositories should
> use normal remote git access, not filesystem access to the repository,
> including using APIs to create and self-merge PRs when pushing to the
> repository is involved.
 
Agreed.

Thanks for writing this up!

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-19 15:51 On pull request workflows for the GNU toolchain Joseph Myers via Gdb
  2024-09-20 18:25 ` Carlos O'Donell via Gdb
@ 2024-09-20 19:58 ` Matt Rice via Gdb
  2024-09-20 20:47   ` Joseph Myers via Gdb
  2024-09-23 12:07 ` Thomas Koenig via Gdb
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Matt Rice via Gdb @ 2024-09-20 19:58 UTC (permalink / raw)
  To: Joseph Myers; +Cc: gcc, libc-alpha, binutils, gdb

On Thu, Sep 19, 2024 at 3:52 PM Joseph Myers via Gdb <gdb@sourceware.org> wrote:
>
> 1. Introduction
>
> This message expands on my remarks at the Cauldron (especially the
> patch review and maintenance BoF, and the Sourceware infrastructure
> BoF) regarding desired features for a system providing pull request
> functionality (patch submission via creating branches that are then
> proposed using some kind of web interface or API, with a central
> database then tracking the status of each pull request and review
> comments thereon automatically),

I wasn't at Cauldron, but

I just wanted to say that a lot of the web-based interfaces (git-lab,
github, gitea, gogs, etc) for providing pull request functionality
pre-date the existence of git-protocol features that also allow you to
implement the pull request type behavior, the git proc receive hook
https://git-scm.com/docs/githooks#proc-receive
can be used to implement a pull-request style workflow without the web
interface. This is somewhat going against the grain,
and has a less developed ecosystem of clients/servers/hosting
resources, I haven't kept up to date with what tools are actually
available,
last I looked though the only tool I knew of using it was
https://git-repo.info/en/ for the client, and I written a prototype
implementation using the hook as a server
(as far as I know the only other server tools using this are the
proprietary codeup.aliyun which might be internal to alibaba?).

To me though it is nice being able to edit the PR cover letter
directly in the editor, and do the pull-request using command line
tools.
What draws me to this however, is that it allows me to set up an
in-house review process using the same set of tools that eventually
feeds into upstream while mirroring the CI architecture locally.  That
is the dream anyway and in my opinion represents the style
of pull request workflow we should be aiming for if we want something
ideal.  Anyhow, IMO a big reason why all of these workflows are so
centralized
is just that they are going outside the git protocol.  To me the work
that has been done seems promising as a proof of concept distributed
pull request system,
and what it is probably lacking is putting an actual web-based
interface on top of the git protocol based design and a lot of spit
and polish.

At least to me the whole sourceware ecosystem feels like an ideal
candidate for this, since it is much more common to have ports in
progress to unfinished or unreleased instruction sets,
than would happen with almost any other piece of software. So there is
some amount of justification for going against the grain of a
centralized system imo.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-20 18:25 ` Carlos O'Donell via Gdb
@ 2024-09-20 20:41   ` Joseph Myers via Gdb
  2024-09-20 21:43   ` Sam James via Gdb
  1 sibling, 0 replies; 25+ messages in thread
From: Joseph Myers via Gdb @ 2024-09-20 20:41 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: gcc, libc-alpha, binutils, gdb

On Fri, 20 Sep 2024, Carlos O'Donell via Gcc wrote:

> > (e) All existing pre-commit checks from hooks should be kept in some
> > form, to maintain existing invariants on both tree and commit contents
> > (some hook checks make sure that commits don't have commit messages
> > that would cause other automated processes to fall over later, for
> > example).
> 
> These could all move to pre-commit CI checks that block merging.

Checks are supposed to apply to direct pushes as well as to merging 
through the PR system.  (Direct pushes should I hope end up being rare on 
mainline, not necessarily rare on other branches.  Some invariants of 
commit history should apply to all commits on all refs, e.g. that no-one 
accidentally copies an old GCC commit message in a way that confuses 
tooling searching for From-SVN: lines.  On mainline, we might eventually 
want an *additional* check on direct pushes, e.g. that they have a 
Reason-for-direct-push: line that explains why the commit can't go through 
the PR system.)

Obviously this does not assert what technology should be used to implement 
such checks on permitted ref updates in the repository (though hopefully 
as little code as possible is executed in the repository context, as much 
as possible in some separate isolated context).  And arranging the 
implementation so as much code as possible can be used both when checking 
the final ref update, and in CI to check a PR that would result in such a 
ref update if accepted, is desirable (just as e.g. GCC shares code between 
the cron job that does ChangeLog updates, and the hook that checks if 
commits have properly formatted ChangeLog entries - though there are 
unfortunately still some cases that only result in the cron job falling 
over, not in the commits being rejected at push time).

-- 
Joseph S. Myers
josmyers@redhat.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-20 19:58 ` Matt Rice via Gdb
@ 2024-09-20 20:47   ` Joseph Myers via Gdb
  0 siblings, 0 replies; 25+ messages in thread
From: Joseph Myers via Gdb @ 2024-09-20 20:47 UTC (permalink / raw)
  To: Matt Rice; +Cc: gcc, libc-alpha, binutils, gdb

On Fri, 20 Sep 2024, Matt Rice via Gcc wrote:

> To me though it is nice being able to edit the PR cover letter
> directly in the editor, and do the pull-request using command line
> tools.

In the common case of a single-commit PR without dependencies, it seems 
reasonable to follow the practice that the commit message for the commit 
pushed (to a branch from which the PR is created) is the same as the cover 
letter / PR description (modulo any lines after "---" only being cover 
letter text not intended to go in the final commit on mainline if the PR 
is merged).

Having the ability to create a PR from the command line is desirable - 
it's one part of having a sufficient API (indeed, if cron jobs that commit 
do so via PRs that they self-merge rather than by direct pushes, they'll 
need such an API).  For people not wanting to install extra tools to 
contribute, the web interface for creating a PR is also important.

For people without write access to the main repository to make PRs, the 
"fork the repository and push to a branch in your fork" functionality of 
forges is necessary, just as people can send a patch to the mailing lists 
without having write access.  (And quite likely such forks would be the 
norm for people creating PRs even when they do have write access.)

-- 
Joseph S. Myers
josmyers@redhat.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-20 18:25 ` Carlos O'Donell via Gdb
  2024-09-20 20:41   ` Joseph Myers via Gdb
@ 2024-09-20 21:43   ` Sam James via Gdb
  1 sibling, 0 replies; 25+ messages in thread
From: Sam James via Gdb @ 2024-09-20 21:43 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Joseph Myers, gcc, libc-alpha, binutils, gdb

Carlos O'Donell <carlos@redhat.com> writes:

> On 9/19/24 11:51 AM, Joseph Myers wrote:
>> 1. Introduction
>
> Thanks for writing this up!
>
> [...]
> Agreed.
>

I just want to say the same. My sentiments match Carlos.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-19 15:51 On pull request workflows for the GNU toolchain Joseph Myers via Gdb
  2024-09-20 18:25 ` Carlos O'Donell via Gdb
  2024-09-20 19:58 ` Matt Rice via Gdb
@ 2024-09-23 12:07 ` Thomas Koenig via Gdb
  2024-09-23 12:53   ` Jeffrey Walton via Gdb
                     ` (3 more replies)
  2024-09-23 12:56 ` Richard Earnshaw (lists) via Gdb
  2024-09-24  3:43 ` Jiang, Haochen via Gdb
  4 siblings, 4 replies; 25+ messages in thread
From: Thomas Koenig via Gdb @ 2024-09-23 12:07 UTC (permalink / raw)
  To: Joseph Myers, gcc, libc-alpha, binutils, gdb, fortran

[For the fortran people: Discussion on gcc@]

Just a general remark.

There are people, such as myself, who regularly mess up
their git repositories because they have no mental model
of what git is doing (case in point: The Fortran unsigned
branch, which I managed to put into an unrepairable state
despite considerable help from people who tried to help me
fix it). This is especially true of volunteer maintainers,
who are still the mainstay of gfortran.

Whatever you end up doing, consider such maintainers, and
if they still can contribute or would simply give up.
If what you end up doing is too complicated, it may end up
severely impacting the gfortran project (and possibly others).

Best regards

	Thomas




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-23 12:07 ` Thomas Koenig via Gdb
@ 2024-09-23 12:53   ` Jeffrey Walton via Gdb
  2024-09-23 13:23   ` Jonathan Wakely via Gdb
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Jeffrey Walton via Gdb @ 2024-09-23 12:53 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: Joseph Myers, gcc, libc-alpha, binutils, gdb, fortran

On Mon, Sep 23, 2024 at 8:08 AM Thomas Koenig via Gdb
<gdb@sourceware.org> wrote:
>
> [For the fortran people: Discussion on gcc@]
>
> Just a general remark.
>
> There are people, such as myself, who regularly mess up
> their git repositories because they have no mental model
> of what git is doing (case in point: The Fortran unsigned
> branch, which I managed to put into an unrepairable state
> despite considerable help from people who tried to help me
> fix it). This is especially true of volunteer maintainers,
> who are still the mainstay of gfortran.

++.

Thousands of upvotes on Stack Overflow questions about simple Git
tasks indicate usability problems in Git, and not a great question on
social media. Confer, <https://meta.stackoverflow.com/q/351700>.

> Whatever you end up doing, consider such maintainers, and
> if they still can contribute or would simply give up.
> If what you end up doing is too complicated, it may end up
> severely impacting the gfortran project (and possibly others).

Jeff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-19 15:51 On pull request workflows for the GNU toolchain Joseph Myers via Gdb
                   ` (2 preceding siblings ...)
  2024-09-23 12:07 ` Thomas Koenig via Gdb
@ 2024-09-23 12:56 ` Richard Earnshaw (lists) via Gdb
  2024-09-23 14:59   ` Joseph Myers via Gdb
  2024-09-24  3:43 ` Jiang, Haochen via Gdb
  4 siblings, 1 reply; 25+ messages in thread
From: Richard Earnshaw (lists) via Gdb @ 2024-09-23 12:56 UTC (permalink / raw)
  To: Joseph Myers, gcc, libc-alpha, binutils, gdb

On 19/09/2024 16:51, Joseph Myers via Gcc wrote:
> 1. Introduction
> 
> This message expands on my remarks at the Cauldron (especially the
> patch review and maintenance BoF, and the Sourceware infrastructure
> BoF) regarding desired features for a system providing pull request
> functionality (patch submission via creating branches that are then
> proposed using some kind of web interface or API, with a central
> database then tracking the status of each pull request and review
> comments thereon automatically), for use by the GNU toolchain (or one
> or more components thereof - there is no need for each component to
> make the same decision about moving to such software and workflow, and
> indeed we have no mechanism to make such decisions for the toolchain
> as a whole).
> 
> This does not advocate a particular choice of software for such
> functionality (though much of the discussion seemed to suggest Forgejo
> as the most likely starting point), or a particular choice of where to
> host it.  Hosting would of course need to meet appropriate security
> requirements, and to achieve a passing grade on the GNU Ethical
> Repository Criteria, and the software would need to be entirely free
> software.  Where relevant features are not already supported, it's
> important that the software is receptive to the addition of such
> features (including cases where part of the functionality is provided
> by software specific to the GNU toolchain or parts thereof - such as
> for the custom checks currently implemented with git hooks - and the
> underlying software provides appropriate interfaces to allow
> integration of such external pieces).  The list of features here may
> be a good basis for reviewing what particular forge software supports
> and whether other features can be added, directly or through use of
> appropriate APIs.
> 
> Forge software may provide other pieces such as bug tracking or wikis
> that we currently handle separately from git hosting.  In such cases,
> we should be able to disable those pieces and keep using the existing
> bug tracking and wiki software (while having the option to decide
> independently to migrate those if desired).
> 
> I consider the overall benefits of such a move to be having more
> structured data about all changes proposed for inclusion and their
> status (needing review, needing changes from the author, under
> discussion, needing merge from mainline, etc.), to help all people
> involved in the patch submission and review process to track such
> information and to find patches needing review as applicable, along
> with providing a more familiar workflow for many people that avoids
> many of the problems with email (which affect experienced contributors
> working around corporate email systems, not just new contributors).
> It would not of course by itself turn people with no interest in or
> understanding of systems software development into contributors (for
> example, people without knowledge of directories and hierarchical file
> storage, or people who only understand software development as web
> development).  Nor would it prevent the accumulation of large backlogs
> of unreviewed patches, as is evident from many large and active
> projects using PR workflows with large numbers of open PRs.
> 
> As Richard noted in his BoF, email sucks.  As I noted in reply, so do
> the web and web browsers when trying to deal with large amounts of
> patch review state (when one wishes to apply one's own view, not the
> forge's, of what is resolved and what needs attention).  As I also
> noted, in the Sourceware infrastructure BoF, tools such as patchwork
> and b4 are the right answer to the wrong question: trying to get
> structured data about patch submissions when working from the axiom
> that emails on a mailing list should be the primary source of truth
> for everything needing review, rather than starting from more
> structured data and generating emails as one form of output.
> 
> Moving to a pull request system is not expected to change policies
> regarding who can approve a change for inclusion, or the technical
> limits on who can cause a change to enter mainline (e.g. all people
> with commit access would be expected to be able to use a button in a
> web interface to cause to PR to be merged, though policy might limit
> when they should do so).  We can of course choose to change policies,
> either as part of adopting a PR system or later.
> 
> 
> 2. Key features
> 
> (a) Some forges have a design that emphasises the tree you get after a
> proposed contribution, but not the sequence of commits to get there.
> For the toolchain, we care about the clean, logical sequence of
> commits on the mainline branch.  (We also use linear history on
> mainline, but that's a secondary matter - though certainly we'd want
> any forge used to support such linear history so that property doesn't
> need to change as part of adopting pull request workflow.)  Having a
> clean sequence of commits has some implications for forge support:
> 
> * Support for reviewing the proposed commit message, not just the
>    diffs, is important (and it should be clear what commit message
>    would result from merging any pull request).
> 
> * Patch series and dependencies between patches are important.  In
>    such cases, fixes for issues from review should go into the
>    appropriate logical commit in the PR (with rebasing as necessary),
>    and it should be possible at all times to see what the sequence of
>    commits on mainline would look like.  (E.g. people use some
>    workarounds on GitHub to manage PR dependencies, but those result in
>    second and subsequent PRs in a series showing the full set of diffs
>    from a PR and those it depends on, rather than just the logical
>    diffs for one PR.)
> 
>    I consider patch series and dependencies to be separate but related
>    things: a patch series may not have strictly linear dependencies
>    (and it's often useful to merge the more straightforward patches
>    from a series while still discussing and revising others), while a
>    patch may depend on other patches that it's not logically part of
>    the same series as.  They are, however, closely related, and a
>    sufficient solution for dependencies might also be adequate for many
>    cases of series.
> 
>    Note that series can sometimes have hundreds of patches; any
>    solution for patch series and dependencies needs to scale that far.
> 
>    There is of course the common case of a single-patch submission,
>    where the patch is ready for inclusion after some number of fixes.
>    In those cases, it's probably convenient if it's not necessary to
>    rebase - provided it's clear that a particular PR would be
>    squash-merged, and also what the commit message would be for the
>    final fixed commit.
> 
> * Given the need for rebasing when working with patch series, it's
>    important to have good support for rebasing.  In particular, all
>    revisions of the changes for a PR that was rebased need to remain
>    permanently available (e.g. through appropriate documented refs to
>    fetch to get all revisions of all PRs).
> 
> (b) Various people have built efficient workflows for going through
> all patch submissions and comments (or all in a particular area), even
> when only reviewing a small proportion, and have concerns about
> efficiency of a web interface when working with many patches and
> comments.  It's important to have good API support to allow people to
> build tools supporting their own workflow like this without needing to
> use the browser interface (and following their own logic, not the
> forge's, for what changes are of interest).  Good API support might,
> for example, include a straightforward way to get all changes to PR
> and comment data and metadata since some particular point, as well as
> for actions such as reviewing / commenting / approving a PR.  Such API
> support might be similar to what's needed to ensure people can readily
> get and maintain a local replica of all the key data and its history
> for all PRs.
> 
> Replication like that is also important for reliably ensuring key data
> remains available even if the forge software subsequently ceases to be
> maintained.  Consider, for example, the apparent disappearance of the
> data from www-gnats.gnu.org (we occasionally come across references to
> old bug reports from that system in glibc development, but don't have
> any way to resolve those references).
> 
> Another use of such an API would be to allow maintaining local copies
> of all PR comments etc. in a plain text form that can be searched with
> grep.
> 
> (c) Given that a transition would be from using mailing lists, it's
> important to have good plain text outward email notifications for all
> new merge requests, comments thereon and other significant actions
> (changing metadata, approvals / merges, etc.) - through whatever
> combination of built-in support in a forge and local implementation
> using the API.  Such notifications would go to the existing mailing
> lists (note that the choice of mailing lists for a patch can depend on
> which parts of the tree it changes - for example, the choice between
> binutils and gdb-patches lists, or some GCC patches going to the
> fortran or libstdc++ lists) - as well as to individuals concerned with
> / subscribed to a given PR.  Some very routine changes, such as
> reports of clean CI results, might be omitted by default from
> notifications sent to the mailing lists.
> 
> "good" includes quoting appropriate diff hunks being comments on.
> It's OK to have an HTML multipart as well, but the quality of the
> plain text part matters.  Diffs themselves should be included in
> new-PR emails (and those for changes to a PR) unless very large.
> 
> I do not however suggest trying to take incoming email (unstructured)
> and turn it into more structured comments on particular parts of a PR
> in the database.
> 
> Similarly, commit emails should continue to go to the existing mailing
> lists for those.
> 
> (d) Rather than the forge owning the mainline branch in the sense of
> every commit having to go through an approved PR, at least initially
> it should be possible for people to push directly to mainline as well,
> so the transition doesn't have to be instantaneous (and in particular,
> if a change was posted before the transition and approved after it, it
> shouldn't be necessary to turn it into a PR).  Longer term, I think it
> would be a good idea to move to everything going through the PR system
> (with people able to self-approve and merge their own PRs immediately
> where they can commit without review at present), so there is better
> structured uniform tracking of all changes and associated CI
> information etc. - however, direct commits to branches other than
> mainline (and maybe release branches) should continue to be OK long
> term (although it should also be possible to make a PR to such a
> branch if desired).
> 
> Beyond putting everything through PRs, eventually I'd hope to have
> merges to mainline normally all go through a CI system that makes sure
> there are no regressions for at least one configuration before merging
> changes.
> 
> (e) All existing pre-commit checks from hooks should be kept in some
> form, to maintain existing invariants on both tree and commit contents
> (some hook checks make sure that commits don't have commit messages
> that would cause other automated processes to fall over later, for
> example).
> 
> (f) Existing cron jobs that read from or commit to repositories should
> use normal remote git access, not filesystem access to the repository,
> including using APIs to create and self-merge PRs when pushing to the
> repository is involved.
> 

Thanks for such a comprehensive write-up Joseph.  I need to read through 
this with a lot more care, but not until my Covid fug has cleared more 
(ugh!).

One thing we must do, however, is break requirements into a number of 
categories: must haves (red lines, we can't transition without this); 
should haves (important, but we can likely find acceptable 
work-arounds); and would like (this would make things really nice, but 
they won't really block a transition).

For the first two categories we also need to think hard about WHY this 
is the case - that is, we should be able to state clearly a 
justification for the requirement.

A quick read of your email suggests you've identified a number of these 
issues, but it's not quite as clear on how hard each requirements needs 
to be, or perhaps why you think it has to be considered.

I don't expect the list to be entirely static either: issues might arise 
as we do more detailed investigation, but this list looks like a good 
starting point.

R.

PS: Forgejo can support issues in bugzilla (and you can teach it a 
regexp rule to create links from our component/id format into a bugzilla 
link); it can also support external wikis.  So if (still quite a big if) 
we went the forgejo route, those issues can be handled.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-23 12:07 ` Thomas Koenig via Gdb
  2024-09-23 12:53   ` Jeffrey Walton via Gdb
@ 2024-09-23 13:23   ` Jonathan Wakely via Gdb
  2024-09-23 13:36     ` enh via Gdb
  2024-09-23 15:20     ` Florian Weimer via Gdb
  2024-09-23 17:57   ` Eric Gallager via Gdb
  2024-09-23 18:30   ` Arsen Arsenović via Gdb
  3 siblings, 2 replies; 25+ messages in thread
From: Jonathan Wakely via Gdb @ 2024-09-23 13:23 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: Joseph Myers, gcc, libc-alpha, binutils, gdb, fortran

On Mon, 23 Sept 2024 at 13:09, Thomas Koenig via Gcc <gcc@gcc.gnu.org> wrote:
>
> [For the fortran people: Discussion on gcc@]
>
> Just a general remark.
>
> There are people, such as myself, who regularly mess up
> their git repositories because they have no mental model
> of what git is doing

I highly recommend https://www.youtube.com/watch?v=1ffBJ4sVUb4 which
gives an excellent mental model for the basics (and everything else
follows from those basic rules).

> (case in point: The Fortran unsigned
> branch, which I managed to put into an unrepairable state
> despite considerable help from people who tried to help me
> fix it). This is especially true of volunteer maintainers,
> who are still the mainstay of gfortran.
>
> Whatever you end up doing, consider such maintainers, and
> if they still can contribute or would simply give up.
> If what you end up doing is too complicated, it may end up
> severely impacting the gfortran project (and possibly others).

We already use Git in all the toolchain projects and there's no
suggestion to change that.

The discussion is about how we do patch submission and patch review.
The GitHub pull request workflow is widely seen as simpler than our
current email-based workflow (not everybody agrees, of course). The
idea is to *lower* the barrier of entry for contributors, not raise
it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-23 13:23   ` Jonathan Wakely via Gdb
@ 2024-09-23 13:36     ` enh via Gdb
  2024-09-23 14:33       ` Jonathan Wakely via Gdb
  2024-09-23 15:03       ` Joseph Myers via Gdb
  2024-09-23 15:20     ` Florian Weimer via Gdb
  1 sibling, 2 replies; 25+ messages in thread
From: enh via Gdb @ 2024-09-23 13:36 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Thomas Koenig, Joseph Myers, gcc, libc-alpha, binutils, gdb, fortran

it doesn't make the patch _management_ problem better ("now i have two
problems"), but https://github.com/landley/toybox takes the "why not both?"
approach --- you can use pull requests if you grew up with/adapted to
git/github, or you can use the mailing list otherwise ... taking into
account that what the "barriers" are depend on whose eye's you're looking
through.

somewhat related, Android's NDK uses github as their issue tracker [while
still having Google's usual "buganizer" issue tracker available] and we get
orders of magnitude more interaction with our users on github --- like it
or not, it's where the users are. anecdotally i notice people report
bugs/send patches to github _mirrors_ of AOSP projects, and have no idea
that's not the actual upstream.

On Mon, Sep 23, 2024 at 9:23 AM Jonathan Wakely <jwakely.gcc@gmail.com>
wrote:

> On Mon, 23 Sept 2024 at 13:09, Thomas Koenig via Gcc <gcc@gcc.gnu.org>
> wrote:
> >
> > [For the fortran people: Discussion on gcc@]
> >
> > Just a general remark.
> >
> > There are people, such as myself, who regularly mess up
> > their git repositories because they have no mental model
> > of what git is doing
>
> I highly recommend https://www.youtube.com/watch?v=1ffBJ4sVUb4 which
> gives an excellent mental model for the basics (and everything else
> follows from those basic rules).
>
> > (case in point: The Fortran unsigned
> > branch, which I managed to put into an unrepairable state
> > despite considerable help from people who tried to help me
> > fix it). This is especially true of volunteer maintainers,
> > who are still the mainstay of gfortran.
> >
> > Whatever you end up doing, consider such maintainers, and
> > if they still can contribute or would simply give up.
> > If what you end up doing is too complicated, it may end up
> > severely impacting the gfortran project (and possibly others).
>
> We already use Git in all the toolchain projects and there's no
> suggestion to change that.
>
> The discussion is about how we do patch submission and patch review.
> The GitHub pull request workflow is widely seen as simpler than our
> current email-based workflow (not everybody agrees, of course). The
> idea is to *lower* the barrier of entry for contributors, not raise
> it.
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-23 13:36     ` enh via Gdb
@ 2024-09-23 14:33       ` Jonathan Wakely via Gdb
  2024-09-23 15:56         ` Iain Sandoe
  2024-09-23 15:03       ` Joseph Myers via Gdb
  1 sibling, 1 reply; 25+ messages in thread
From: Jonathan Wakely via Gdb @ 2024-09-23 14:33 UTC (permalink / raw)
  To: enh; +Cc: Thomas Koenig, Joseph Myers, gcc, libc-alpha, binutils, gdb, fortran

On Mon, 23 Sept 2024 at 14:36, enh wrote:
>
> it doesn't make the patch _management_ problem better ("now i have two problems"), but https://github.com/landley/toybox takes the "why not both?" approach --- you can use pull requests if you grew up with/adapted to git/github, or you can use the mailing list otherwise ... taking into account that what the "barriers" are depend on whose eye's you're looking through.
>
> somewhat related, Android's NDK uses github as their issue tracker [while still having Google's usual "buganizer" issue tracker available] and we get orders of magnitude more interaction with our users on github --- like it or not, it's where the users are. anecdotally i notice people report bugs/send patches to github _mirrors_ of AOSP projects, and have no idea that's not the actual upstream.

We have the same problem with github's gcc-mirror/gcc repo which we
don't even own and so can't do anything with.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-23 12:56 ` Richard Earnshaw (lists) via Gdb
@ 2024-09-23 14:59   ` Joseph Myers via Gdb
  0 siblings, 0 replies; 25+ messages in thread
From: Joseph Myers via Gdb @ 2024-09-23 14:59 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: gcc, libc-alpha, binutils, gdb

On Mon, 23 Sep 2024, Richard Earnshaw (lists) via Gcc wrote:

> One thing we must do, however, is break requirements into a number of
> categories: must haves (red lines, we can't transition without this); should
> haves (important, but we can likely find acceptable work-arounds); and would
> like (this would make things really nice, but they won't really block a
> transition).

The assessment of a forge against the criteria isn't expected to be simple 
yes/no; it's expected to involve more of an analysis/discussion of how 
criteria / underlying goals relate to the facilities provided by a 
particular forge.  And there isn't a very sharp line between "work-around" 
and "doing this in some external software hooked up to the forge with an 
API is the expected way of doing such things with the forge".  (Plenty of 
projects make extensive use of APIs to implement their own workflows on 
GitHub, for example.  That sort of thing works much better for e.g. CI or 
actions that are supposed to happen post-commit, than for something like 
support for dependencies / patch series which is more of a core feature 
needing to be present in the underlying software.)

-- 
Joseph S. Myers
josmyers@redhat.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-23 13:36     ` enh via Gdb
  2024-09-23 14:33       ` Jonathan Wakely via Gdb
@ 2024-09-23 15:03       ` Joseph Myers via Gdb
  1 sibling, 0 replies; 25+ messages in thread
From: Joseph Myers via Gdb @ 2024-09-23 15:03 UTC (permalink / raw)
  To: enh
  Cc: Jonathan Wakely, Thomas Koenig, gcc, libc-alpha, binutils, gdb, fortran

On Mon, 23 Sep 2024, enh via Gcc wrote:

> it doesn't make the patch _management_ problem better ("now i have two
> problems"), but https://github.com/landley/toybox takes the "why not both?"
> approach --- you can use pull requests if you grew up with/adapted to
> git/github, or you can use the mailing list otherwise ... taking into
> account that what the "barriers" are depend on whose eye's you're looking
> through.

My expectation is that such a split would need to work for an initial 
transitional period at least (for reviews of patches posted before the 
move to the forge software without requiring all such under-review patches 
to go into PRs if people want review, if nothing else).  While I think 
there are advantages in terms of structured data if everything ends up 
using PRs (including people doing PRs that are immediately self-merged of 
changes in areas they maintain), it would be possible to do otherwise (at 
least until you get to wanting all merges to mainline to be done by a CI 
system that maintains a regression-free state for at least one 
configuration).

-- 
Joseph S. Myers
josmyers@redhat.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-23 13:23   ` Jonathan Wakely via Gdb
  2024-09-23 13:36     ` enh via Gdb
@ 2024-09-23 15:20     ` Florian Weimer via Gdb
  2024-09-23 15:44       ` Jonathan Wakely via Gdb
  1 sibling, 1 reply; 25+ messages in thread
From: Florian Weimer via Gdb @ 2024-09-23 15:20 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Thomas Koenig, Joseph Myers, gcc, libc-alpha, binutils, gdb, fortran

* Jonathan Wakely:

> The discussion is about how we do patch submission and patch review.
> The GitHub pull request workflow is widely seen as simpler than our
> current email-based workflow (not everybody agrees, of course). The
> idea is to *lower* the barrier of entry for contributors, not raise
> it.

It should also help those who feel unsure about their use of Git because
the reviewers see exactly what ends up getting merged (at least with the
default workflow).  With the email-based workflow, that isn't the case,
and I know it makes some people feel nervous.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-23 15:20     ` Florian Weimer via Gdb
@ 2024-09-23 15:44       ` Jonathan Wakely via Gdb
  0 siblings, 0 replies; 25+ messages in thread
From: Jonathan Wakely via Gdb @ 2024-09-23 15:44 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Thomas Koenig, Joseph Myers, gcc, libc-alpha, binutils, gdb, fortran

On Mon, 23 Sept 2024 at 16:20, Florian Weimer wrote:
>
> * Jonathan Wakely:
>
> > The discussion is about how we do patch submission and patch review.
> > The GitHub pull request workflow is widely seen as simpler than our
> > current email-based workflow (not everybody agrees, of course). The
> > idea is to *lower* the barrier of entry for contributors, not raise
> > it.
>
> It should also help those who feel unsure about their use of Git because
> the reviewers see exactly what ends up getting merged (at least with the
> default workflow).  With the email-based workflow, that isn't the case,
> and I know it makes some people feel nervous.


Yes, our current workflow is "the patch in your email looks good,
please push what you have in your tree", and what gets pushed could be
different from what was reviewed (probably unintentionally). I don't
think issues caused by this happen often, but a pull-based workflow
can avoid the problem.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-23 14:33       ` Jonathan Wakely via Gdb
@ 2024-09-23 15:56         ` Iain Sandoe
  0 siblings, 0 replies; 25+ messages in thread
From: Iain Sandoe @ 2024-09-23 15:56 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: enh, Thomas Koenig, Joseph Myers, GCC Development, libc-alpha,
	GCC Binutils, gdb, GCC Fortran



> On 23 Sep 2024, at 15:33, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
> 
> On Mon, 23 Sept 2024 at 14:36, enh wrote:
>> 
>> it doesn't make the patch _management_ problem better ("now i have two problems"), but https://github.com/landley/toybox takes the "why not both?" approach --- you can use pull requests if you grew up with/adapted to git/github, or you can use the mailing list otherwise ... taking into account that what the "barriers" are depend on whose eye's you're looking through.
>> 
>> somewhat related, Android's NDK uses github as their issue tracker [while still having Google's usual "buganizer" issue tracker available] and we get orders of magnitude more interaction with our users on github --- like it or not, it's where the users are. anecdotally i notice people report bugs/send patches to github _mirrors_ of AOSP projects, and have no idea that's not the actual upstream.
> 
> We have the same problem with github's gcc-mirror/gcc repo which we
> don't even own and so can't do anything with.

Similatly..  I get waay more Darwin bug reports via the development branches on GH than I do directly (even when the bug applies totally to ‘upstream’).


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-23 12:07 ` Thomas Koenig via Gdb
  2024-09-23 12:53   ` Jeffrey Walton via Gdb
  2024-09-23 13:23   ` Jonathan Wakely via Gdb
@ 2024-09-23 17:57   ` Eric Gallager via Gdb
  2024-09-23 18:39     ` Jonathan Wakely via Gdb
  2024-09-23 18:30   ` Arsen Arsenović via Gdb
  3 siblings, 1 reply; 25+ messages in thread
From: Eric Gallager via Gdb @ 2024-09-23 17:57 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: Joseph Myers, gcc, libc-alpha, binutils, gdb, fortran

On Mon, Sep 23, 2024 at 8:09 AM Thomas Koenig via Gcc <gcc@gcc.gnu.org> wrote:
>
> [For the fortran people: Discussion on gcc@]
>
> Just a general remark.
>
> There are people, such as myself, who regularly mess up
> their git repositories because they have no mental model
> of what git is doing (case in point: The Fortran unsigned
> branch, which I managed to put into an unrepairable state
> despite considerable help from people who tried to help me
> fix it).

As one such person who has messed up his fork of GCC, I'd just like to
note that in my particular case at least, I messed it up because I was
trying to apply GitHub's model for git usage, while the GCC project
has a very different model for git usage, and the two don't exactly
play very well with one another. I see switching to a pull request
model as reducing the chances of people getting their forks into
unusable states, rather than increasing it.

> This is especially true of volunteer maintainers,
> who are still the mainstay of gfortran.
>
> Whatever you end up doing, consider such maintainers, and
> if they still can contribute or would simply give up.
> If what you end up doing is too complicated, it may end up
> severely impacting the gfortran project (and possibly others).
>
> Best regards
>
>         Thomas
>
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-23 12:07 ` Thomas Koenig via Gdb
                     ` (2 preceding siblings ...)
  2024-09-23 17:57   ` Eric Gallager via Gdb
@ 2024-09-23 18:30   ` Arsen Arsenović via Gdb
  3 siblings, 0 replies; 25+ messages in thread
From: Arsen Arsenović via Gdb @ 2024-09-23 18:30 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: Joseph Myers, gcc, libc-alpha, binutils, gdb, fortran

[-- Attachment #1: Type: text/plain, Size: 1664 bytes --]

Thomas Koenig <tkoenig@netcologne.de> writes:

> [For the fortran people: Discussion on gcc@]
>
> Just a general remark.
>
> There are people, such as myself, who regularly mess up
> their git repositories because they have no mental model
> of what git is doing (case in point: The Fortran unsigned
> branch, which I managed to put into an unrepairable state
> despite considerable help from people who tried to help me
> fix it). This is especially true of volunteer maintainers,
> who are still the mainstay of gfortran.
>
> Whatever you end up doing, consider such maintainers, and
> if they still can contribute or would simply give up.
> If what you end up doing is too complicated, it may end up
> severely impacting the gfortran project (and possibly others).

Git is extremely helpful if one learns to wield it rather than fight it.
It is based on a very simple model also.  I strongly encourage going
over https://git-scm.com/book/en/v2 and/or
https://eagain.net/articles/git-for-computer-scientists/ as it might
help you even independent of collaboration (for instance, I scarcely do
any work _without_ git nowadays.. it helps me to find old revisions and
temporary unfinished work, to synchronize work across machines, to
triage bugs, context switch, ...).

The workflow change proposed would reduce sending and reviewing patches
to a push, and interaction via some different (perhaps web?) means.  It
should not be more complex than email.

If the gfortran project finds that such a workflow hinders it, I suggest
a hybrid approach, where maintainers still can send patches via current
means.
-- 
Arsen Arsenović

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 381 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: On pull request workflows for the GNU toolchain
  2024-09-23 17:57   ` Eric Gallager via Gdb
@ 2024-09-23 18:39     ` Jonathan Wakely via Gdb
  0 siblings, 0 replies; 25+ messages in thread
From: Jonathan Wakely via Gdb @ 2024-09-23 18:39 UTC (permalink / raw)
  To: Eric Gallager
  Cc: Thomas Koenig, Joseph Myers, gcc, libc-alpha, binutils, gdb, fortran

On Mon, 23 Sept 2024 at 19:00, Eric Gallager via Gcc <gcc@gcc.gnu.org> wrote:
>
> On Mon, Sep 23, 2024 at 8:09 AM Thomas Koenig via Gcc <gcc@gcc.gnu.org> wrote:
> >
> > [For the fortran people: Discussion on gcc@]
> >
> > Just a general remark.
> >
> > There are people, such as myself, who regularly mess up
> > their git repositories because they have no mental model
> > of what git is doing (case in point: The Fortran unsigned
> > branch, which I managed to put into an unrepairable state
> > despite considerable help from people who tried to help me
> > fix it).
>
> As one such person who has messed up his fork of GCC, I'd just like to
> note that in my particular case at least, I messed it up because I was
> trying to apply GitHub's model for git usage, while the GCC project
> has a very different model for git usage, and the two don't exactly
> play very well with one another.

Only in the sense that the GCC project wants a linear history without
merge requests, and GitHub will happily let you create whatever mess
of merges and jumbled commits you choose to. Using pull requests won't
change that - you will need to sort your branch out before it will get
merged.


> I see switching to a pull request
> model as reducing the chances of people getting their forks into
> unusable states, rather than increasing it.

I don't think it will change it at all, you'll still be able to do
anything to your fork, it just won't get approved for merging if it's
a mess.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: On pull request workflows for the GNU toolchain
  2024-09-19 15:51 On pull request workflows for the GNU toolchain Joseph Myers via Gdb
                   ` (3 preceding siblings ...)
  2024-09-23 12:56 ` Richard Earnshaw (lists) via Gdb
@ 2024-09-24  3:43 ` Jiang, Haochen via Gdb
  2024-09-24 16:42   ` Joseph Myers via Gdb
  4 siblings, 1 reply; 25+ messages in thread
From: Jiang, Haochen via Gdb @ 2024-09-24  3:43 UTC (permalink / raw)
  To: Joseph Myers, gcc, libc-alpha, binutils, gdb

> From: Joseph Myers <josmyers@redhat.com>
> Sent: Thursday, September 19, 2024 11:51 PM
> 

Hi Jospeh,

Thank for your overall introduction on the scope of the future PR
system. It will help the patches not flooded in mails. And keep merging
what we have got in PRs to the right branch to avoid some accidents.

I have several comments or maybe questions on that.

> 
> Forge software may provide other pieces such as bug tracking or wikis
> that we currently handle separately from git hosting.  In such cases,
> we should be able to disable those pieces and keep using the existing
> bug tracking and wiki software (while having the option to decide
> independently to migrate those if desired).

> 
> Similarly, commit emails should continue to go to the existing mailing
> lists for those.
>

Regarding this, it seems that we will still stick to Bugzilla, gcc-wwwdocs,
etc and also keep most of the current mailing threads.

I am running regression tests on x86_64 and sending the regressions to
gcc-regression mailing thread, will I need to send to another place or
using another API to do that?

Note: I am ok to change that if needed, but just in advance so that
I can have some time to test to avoid bugs in scripts.

> 
> Beyond putting everything through PRs, eventually I'd hope to have
> merges to mainline normally all go through a CI system that makes sure
> there are no regressions for at least one configuration before merging
> changes.
> 

CI might take long time if not just building but running regression tests
from my current experience, it might cause some inconvenience for
someone who only edits something like MAINTAINER lists.

Another question is should we also open a PR for backport commits?
I suppose we need that under current PR infrastructure to get it merged,
but it might be some redundancy. But it is definitely a good thing since
I once backported a patch a month earlier that I expected because I don't
notice I am on the wrong branch.

Also for the current commit for obvious, will that policy change?

Overall, I suppose the PR system will be better than current system, thanks
for all the efforts!

Thx,
Haochen

> 
> --
> Joseph S. Myers
> josmyers@redhat.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: On pull request workflows for the GNU toolchain
  2024-09-24  3:43 ` Jiang, Haochen via Gdb
@ 2024-09-24 16:42   ` Joseph Myers via Gdb
  2024-09-25  3:01     ` Jiang, Haochen via Gdb
  0 siblings, 1 reply; 25+ messages in thread
From: Joseph Myers via Gdb @ 2024-09-24 16:42 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: gcc, libc-alpha, binutils, gdb

On Tue, 24 Sep 2024, Jiang, Haochen via Gcc wrote:

> I am running regression tests on x86_64 and sending the regressions to
> gcc-regression mailing thread, will I need to send to another place or
> using another API to do that?

I don't expect use of pull requests to change anything about existing 
regression processing that uses mailing lists.  (Systems such as Linaro's 
that identify a specific commit responsible for a regression might wish to 
update the PR accordingly using whatever API is available for posting 
comments on PRs.)

Note the recent discussion of making contrib/test_summary able to submit 
test results to bunsen.

Where projects have existing pre-commit CI that puts CI results in 
patchwork based on patches processed there, I hope such CI would be 
updated to work with pull requests and update the pull requests with such 
CI results.

> > Beyond putting everything through PRs, eventually I'd hope to have
> > merges to mainline normally all go through a CI system that makes sure
> > there are no regressions for at least one configuration before merging
> > changes.
> > 
> 
> CI might take long time if not just building but running regression tests
> from my current experience, it might cause some inconvenience for
> someone who only edits something like MAINTAINER lists.

MAINTAINERS list updates are not urgent, it should be fine for them to 
wait for a batch of approved PRs to pass CI.  (The commit rate in GCC is 
sufficient, and testing sufficiently slow, that I expect a CI system 
managing merges would need to test potential commits to mainline as a 
batch rather than testing every individual commit before it goes in 
mainline.)

> Another question is should we also open a PR for backport commits?

I've left that question open.  I'd hope mainline would eventually move to 
everything going via PRs; other development branches people might freely 
continue to commit directly to, though with the option of using PRs to 
such a branch if the branch maintainers find it useful; and for release 
branches we'd need to consider the best process separately.

> Also for the current commit for obvious, will that policy change?

I wouldn't expect such a policy to change.  With a system of everything 
going through PRs, that would mean anyone with write access could 
approve and merge their own PRs that meet the obvious rule.

-- 
Joseph S. Myers
josmyers@redhat.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: On pull request workflows for the GNU toolchain
  2024-09-24 16:42   ` Joseph Myers via Gdb
@ 2024-09-25  3:01     ` Jiang, Haochen via Gdb
  2024-09-25 14:46       ` Joseph Myers via Gdb
  0 siblings, 1 reply; 25+ messages in thread
From: Jiang, Haochen via Gdb @ 2024-09-25  3:01 UTC (permalink / raw)
  To: Joseph Myers; +Cc: gcc, libc-alpha, binutils, gdb

> From: Joseph Myers <josmyers@redhat.com>
> Sent: Wednesday, September 25, 2024 12:43 AM
> 
> On Tue, 24 Sep 2024, Jiang, Haochen via Gcc wrote:
> 
> Where projects have existing pre-commit CI that puts CI results in
> patchwork based on patches processed there, I hope such CI would be
> updated to work with pull requests and update the pull requests with such
> CI results.
> 
> > > Beyond putting everything through PRs, eventually I'd hope to have
> > > merges to mainline normally all go through a CI system that makes sure
> > > there are no regressions for at least one configuration before merging
> > > changes.
> > >
> >
> > CI might take long time if not just building but running regression tests
> > from my current experience, it might cause some inconvenience for
> > someone who only edits something like MAINTAINER lists.
> 
> MAINTAINERS list updates are not urgent, it should be fine for them to
> wait for a batch of approved PRs to pass CI.  (The commit rate in GCC is
> sufficient, and testing sufficiently slow, that I expect a CI system
> managing merges would need to test potential commits to mainline as a
> batch rather than testing every individual commit before it goes in
> mainline.)

Aha, that is currently what arm and us are doing. Seems that won't change
most of the workflow. Actually, I am happy with directly sending the
regression to the exact PR, it will be much clearer.

The potential issue might be the PR will be closed after merging, which might
be flooded in history if the regression is not fixed with the PR forgotten to be 
reopened. I am not sure the reopen could be automatically done. If it could,
it will be great, or I will still also send a mail to the mailing thread to record
them explicitly just like how we does currently in gcc-patches and
gcc-regression mailing thread.

Thx,
Haochen



^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: On pull request workflows for the GNU toolchain
  2024-09-25  3:01     ` Jiang, Haochen via Gdb
@ 2024-09-25 14:46       ` Joseph Myers via Gdb
  2024-09-26  1:42         ` Jiang, Haochen via Gdb
  0 siblings, 1 reply; 25+ messages in thread
From: Joseph Myers via Gdb @ 2024-09-25 14:46 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: gcc, libc-alpha, binutils, gdb

On Wed, 25 Sep 2024, Jiang, Haochen via Gcc wrote:

> The potential issue might be the PR will be closed after merging, which might
> be flooded in history if the regression is not fixed with the PR forgotten to be 
> reopened. I am not sure the reopen could be automatically done. If it could,

I think reopening is for when a PR was reverted after being merged; not 
for when a regression was found but is going to be addressed through 
subsequent fix commits rather than through reverting.  You might want to 
file a bug in Bugzilla for the regression (automatically or otherwise), 
however.

-- 
Joseph S. Myers
josmyers@redhat.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: On pull request workflows for the GNU toolchain
  2024-09-25 14:46       ` Joseph Myers via Gdb
@ 2024-09-26  1:42         ` Jiang, Haochen via Gdb
  0 siblings, 0 replies; 25+ messages in thread
From: Jiang, Haochen via Gdb @ 2024-09-26  1:42 UTC (permalink / raw)
  To: Joseph Myers; +Cc: gcc, libc-alpha, binutils, gdb

> From: Joseph Myers <josmyers@redhat.com>
> Sent: Wednesday, September 25, 2024 10:46 PM
> 
> On Wed, 25 Sep 2024, Jiang, Haochen via Gcc wrote:
> 
> > The potential issue might be the PR will be closed after merging, which might
> > be flooded in history if the regression is not fixed with the PR forgotten to be
> > reopened. I am not sure the reopen could be automatically done. If it could,
> 
> I think reopening is for when a PR was reverted after being merged; not
> for when a regression was found but is going to be addressed through
> subsequent fix commits rather than through reverting.  You might want to
> file a bug in Bugzilla for the regression (automatically or otherwise),
> however.

I will have a try on that when I have some time. Maybe in GCC stage 3 I
suppose since I plan to go through the current regressions to see if it is
fixed and the regression framework to see if it can improve at that time.

Thx,
Haochen

> 
> --
> Joseph S. Myers
> josmyers@redhat.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-09-26  1:43 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-19 15:51 On pull request workflows for the GNU toolchain Joseph Myers via Gdb
2024-09-20 18:25 ` Carlos O'Donell via Gdb
2024-09-20 20:41   ` Joseph Myers via Gdb
2024-09-20 21:43   ` Sam James via Gdb
2024-09-20 19:58 ` Matt Rice via Gdb
2024-09-20 20:47   ` Joseph Myers via Gdb
2024-09-23 12:07 ` Thomas Koenig via Gdb
2024-09-23 12:53   ` Jeffrey Walton via Gdb
2024-09-23 13:23   ` Jonathan Wakely via Gdb
2024-09-23 13:36     ` enh via Gdb
2024-09-23 14:33       ` Jonathan Wakely via Gdb
2024-09-23 15:56         ` Iain Sandoe
2024-09-23 15:03       ` Joseph Myers via Gdb
2024-09-23 15:20     ` Florian Weimer via Gdb
2024-09-23 15:44       ` Jonathan Wakely via Gdb
2024-09-23 17:57   ` Eric Gallager via Gdb
2024-09-23 18:39     ` Jonathan Wakely via Gdb
2024-09-23 18:30   ` Arsen Arsenović via Gdb
2024-09-23 12:56 ` Richard Earnshaw (lists) via Gdb
2024-09-23 14:59   ` Joseph Myers via Gdb
2024-09-24  3:43 ` Jiang, Haochen via Gdb
2024-09-24 16:42   ` Joseph Myers via Gdb
2024-09-25  3:01     ` Jiang, Haochen via Gdb
2024-09-25 14:46       ` Joseph Myers via Gdb
2024-09-26  1:42         ` Jiang, Haochen via Gdb

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox