Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

From: Dave Martin <Dave.Martin@arm.com>
To: Torvald Riegel <triegel@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>,
	libc-alpha@sourceware.org,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Marc Zyngier <Marc.Zyngier@arm.com>,
	gdb@sourceware.org,	Yao Qi <qiyaoltc@gmail.com>,
	Christoffer Dall <christoffer.dall@linaro.org>,
	Alan Hayward <alan.hayward@arm.com>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support
Date: Tue, 06 Dec 2016 14:46:00 -0000	[thread overview]
Message-ID: <20161206144642.GV1574@e103592.cambridge.arm.com> (raw)
In-Reply-To: <1480977739.14990.250.camel@redhat.com>

On Mon, Dec 05, 2016 at 11:42:19PM +0100, Torvald Riegel wrote:

Hi there,

> On Wed, 2016-11-30 at 12:06 +0000, Dave Martin wrote:
> > So, my key goal is to support _per-process_ vector length control.
> > 
> > From the kernel perspective, it is easiest to achieve this by providing
> > per-thread control since that is the unit that context switching acts
> > on.
> > 
> > How useful it really is to have threads with different VLs in the same
> > process is an open question.  It's theoretically useful for runtime
> > environments, which may want to dispatch code optimised for different
> > VLs
> 
> What would be the primary use case(s)?  Vectorization of short vectors
> (eg, if having an array of structs or sth like that)?

I'm not sure exactly what you're asking here.

SVE supports a regular SIMD-type computational model, along with
scalable vectors and features for speculative vectorisation of loops
whose iteration count is not statically known (or, possibly not known
even at loop entry at runtime).  It's intended as a compiler target, so
any algorithm that involves iterative computation may get some benefit
-- though the amount of benefit, and how the benefit scales with vector
length, will depend on the algorithm in question.

So some algorithms may get more benefit more from large VLs than others.
For jobs where performance tends to saturate at a shorter VL, it may
make sense to get the compiler to compile for the shorter VL -- this
may enable the same binary code to perform more optimally on a wider
range of hardware, but that may also mean you want to run that job with
the VL it was compiled for instead of what the hardware
supports.

In high-assurance scenarios, you might also want to restrict a
particular job to run at the VL that you validated for.

> > -- changing the VL on-the-fly within a single thread is not
> > something I want to encourage, due to overhead and ABI issues, but
> > switching between threads of different VLs would be more manageable.
> 
> So if on-the-fly switching is probably not useful, that would mean we
> need special threads for the use cases.  Is that a realistic assumption
> for the use cases?  Or do you primarily want to keep it possible to do
> this, regardless of whether there are real use cases now?
> I suppose allowing for a per-thread setting of VL could also be added as
> a feature in the future without breaking existing code.

Per-thread VL use cases are hypothetical for now.

It's easy to support per-thread VLs in the kernel, but we could deny it
initially and wait for someone to come along with a concrete use case.

> > For setcontext/setjmp, we don't save/restore any SVE state due to the
> > caller-save status of SVE, and I would not consider it necessary to
> > save/restore VL itself because of the no-change-on-the-fly policy for
> > this.
> 
> Thus, you would basically consider VL changes or per-thread VL as in the
> realm of compilation internals?  So, the specific size for a particular
> piece of code would not be part of an ABI?

Basically yes.  For most people, this would be hidden in libc/ld.so/some
framework.  This goes for most prctl()s -- random user code shouldn't
normally touch them unless it knows what it's doing.

> > I'm not familiar with resumable functions/executors -- are these in
> > the C++ standards yet (not that that would cause me to be familiar
> > with them... ;)  Any implementation of coroutines (i.e.,
> > cooperative switching) is likely to fall under the "setcontext"
> > argument above.
> 
> These are not part of the C++ standard yet, but will appear in TSes.
> There are various features for which implementations would be assumed to
> use one OS thread for several tasks, coroutines, etc.  Some of them
> switch between these tasks or coroutines while these are running,

Is the switching ever preemptive?  If not, that these features are
unlikely to be a concern for SVE.  It's preemptive switching that would
require the saving of extra SVE state (which is why we need to care for
signals).

> whereas the ones that will be in C++17 only run more than parallel task
> on the same OS thread but one after the other (like in a thread pool).

If jobs are only run to completion before yielding, that again isn't a
concern for SVE.

> However, if we are careful not to expose VL or make promises about it,
> this may just end up being a detail similar to, say, register
> allocation, which isn't exposed beyond the internals of a particular
> compiler either.
> Exposing it as a feature the user can set without messing with the
> implementation would introduce additional thread-specific state, as
> Florian said.  This might not be a show-stopper by itself, but the more
> thread-specific state we have the more an implementation has to take
> care of or switch, and the higher the runtime costs are.  C++17 already
> makes weaker promises for TLS for parallel tasks, so that
> implementations don't have to run TLS constructors or destructors just
> because a small parallel task was executed.

There's a difference between a feature that exposed by the kernel, and
a feature endorsed by the language / runtime.

For example, random code can enable seccomp via prctl(PR_SET_SECCOMP)
-- this may make most of libc unsafe to use, because under strict
seccomp most syscalls simply kill the thread.  libc doesn't pretend to
support this out of the box, but this feature is also not needlessly
denied to user code that knows what it's doing.

I tend to put setting the VL into this category: it is safe, and
useful or even necessary to change the VL in some situations, but
userspace is responsible for managing this for itself.  The kernel
doesn't have enough information to make these decisions unilaterally.

Cheers
---Dave

next prev parent reply	other threads:[~2016-12-06 14:46 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-25 19:39 Dave Martin
2016-11-25 19:41 ` [RFC PATCH 17/29] arm64/sve: signal: Dump Scalable Vector Extension registers to user stack Dave Martin
2016-11-25 19:41 ` [RFC PATCH 18/29] arm64/sve: signal: Restore FPSIMD/SVE state in rt_sigreturn Dave Martin
2016-11-25 19:41 ` [RFC PATCH 24/29] arm64/sve: Discard SVE state on system call Dave Martin
2016-11-25 19:41 ` [RFC PATCH 16/29] arm64/sve: signal: Add SVE state record to sigcontext Dave Martin
2016-11-25 19:42 ` [RFC PATCH 27/29] arm64/sve: ptrace support Dave Martin
2016-11-30  9:56 ` [RFC PATCH 00/29] arm64: Scalable Vector Extension core support Yao Qi
2016-11-30 12:07   ` Dave Martin
2016-11-30 12:22     ` Szabolcs Nagy
2016-11-30 14:10       ` Dave Martin
2016-11-30 12:38     ` Florian Weimer
2016-11-30 13:56       ` Dave Martin
2016-12-01  9:21         ` Florian Weimer
2016-12-01 10:30           ` Dave Martin
2016-12-01 12:19             ` Dave Martin
2016-12-05 10:44             ` Florian Weimer
2016-12-05 11:07               ` Szabolcs Nagy
2016-12-05 15:05               ` Dave Martin
2016-12-02 11:49       ` Dave Martin
2016-12-02 16:34         ` Florian Weimer
2016-12-02 16:59           ` Joseph Myers
2016-12-02 18:21             ` Dave Martin
2016-12-02 21:57               ` Joseph Myers
2016-12-02 21:56     ` Yao Qi
2016-12-05 15:12       ` Dave Martin
2016-12-05 22:42     ` Torvald Riegel
2016-12-06 14:46       ` Dave Martin [this message]
2016-11-30 10:08 ` Florian Weimer
2016-11-30 11:06   ` Szabolcs Nagy
2016-11-30 14:06     ` Dave Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161206144642.GV1574@e103592.cambridge.arm.com \
    --to=dave.martin@arm.com \
    --cc=Marc.Zyngier@arm.com \
    --cc=alan.hayward@arm.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=christoffer.dall@linaro.org \
    --cc=fweimer@redhat.com \
    --cc=gdb@sourceware.org \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=qiyaoltc@gmail.com \
    --cc=triegel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox