* [RFC PATCH 00/29] arm64: Scalable Vector Extension core support
@ 2016-11-25 19:39 Dave Martin
2016-11-25 19:41 ` [RFC PATCH 16/29] arm64/sve: signal: Add SVE state record to sigcontext Dave Martin
` (6 more replies)
0 siblings, 7 replies; 30+ messages in thread
From: Dave Martin @ 2016-11-25 19:39 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Christoffer Dall, Florian Weimer, Ard Biesheuvel, Marc Zyngier,
Alan Hayward, libc-alpha, gdb
The Scalable Vector Extension (SVE) [1] is an extension to AArch64 which
adds extra SIMD functionality and supports much larger vectors.
This series implements core Linux support for SVE.
Recipents not copied on the whole series can find the rest of the
patches in the linux-arm-kernel archives [2].
The first 5 patches "arm64: signal: ..." factor out the allocation and
placement of state information in the signal frame. The first three
are prerequisites for the SVE support patches.
Patches 04-05 implement expansion of the signal frame, and may remain
controversial due to ABI break issues:
* Discussion is needed on how userspace should detect/negotiate signal
frame size in order for this expansion mechanism to be workable.
The remaining patches implement initial SVE support for Linux, with the
following limitations:
* No KVM/virtualisation support for guests.
* No independent SVE vector length configuration per thread. This is
planned, but will follow as a separate add-on series.
* As a temporary workaround for the signal frame size issue, vector
length is software-limited to 512 bits (see patch 29), with a
build-time kernel configuration option to relax this.
Discussion is needed on how to smooth address the signal ABI issues
so that this workaround can be removed.
* A fair number of development BUG_ON()s are still present, which
will be demoted or removed for merge.
* There is a context-switch race condition lurking somewhere which
fires in certain situations with my development KVM hacks (not part
of this posting) -- the underlying bug might or might not be in this
series.
Review and comments welcome.
Cheers
---Dave
[1] https://community.arm.com/groups/processors/blog/2016/08/22/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/thread.html
Alan Hayward (1):
arm64/sve: ptrace support
Dave Martin (28):
arm64: signal: Refactor sigcontext parsing in rt_sigreturn
arm64: signal: factor frame layout and population into separate passes
arm64: signal: factor out signal frame record allocation
arm64: signal: Allocate extra sigcontext space as needed
arm64: signal: Parse extra_context during sigreturn
arm64: efi: Add missing Kconfig dependency on KERNEL_MODE_NEON
arm64/sve: Allow kernel-mode NEON to be disabled in Kconfig
arm64/sve: Low-level save/restore code
arm64/sve: Boot-time feature detection and reporting
arm64/sve: Boot-time feature enablement
arm64/sve: Expand task_struct for Scalable Vector Extension state
arm64/sve: Save/restore SVE state on context switch paths
arm64/sve: Basic support for KERNEL_MODE_NEON
Revert "arm64/sve: Allow kernel-mode NEON to be disabled in Kconfig"
arm64/sve: Restore working FPSIMD save/restore around signals
arm64/sve: signal: Add SVE state record to sigcontext
arm64/sve: signal: Dump Scalable Vector Extension registers to user
stack
arm64/sve: signal: Restore FPSIMD/SVE state in rt_sigreturn
arm64/sve: Avoid corruption when replacing the SVE state
arm64/sve: traps: Add descriptive string for SVE exceptions
arm64/sve: Enable SVE on demand for userspace
arm64/sve: Implement FPSIMD-only context for tasks not using SVE
arm64/sve: Move ZEN handling to the common task_fpsimd_load() path
arm64/sve: Discard SVE state on system call
arm64/sve: Avoid preempt_disable() during sigreturn
arm64/sve: Avoid stale user register state after SVE access exception
arm64: KVM: Treat SVE use by guests as undefined instruction execution
arm64/sve: Limit vector length to 512 bits by default
arch/arm64/Kconfig | 48 +++
arch/arm64/include/asm/esr.h | 3 +-
arch/arm64/include/asm/fpsimd.h | 37 +++
arch/arm64/include/asm/fpsimdmacros.h | 145 +++++++++
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/include/asm/sysreg.h | 11 +
arch/arm64/include/asm/thread_info.h | 2 +
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/ptrace.h | 125 ++++++++
arch/arm64/include/uapi/asm/sigcontext.h | 117 ++++++++
arch/arm64/kernel/cpufeature.c | 3 +
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/entry-fpsimd.S | 17 ++
arch/arm64/kernel/entry.S | 18 +-
arch/arm64/kernel/fpsimd.c | 301 ++++++++++++++++++-
arch/arm64/kernel/head.S | 16 +-
arch/arm64/kernel/process.c | 2 +-
arch/arm64/kernel/ptrace.c | 254 +++++++++++++++-
arch/arm64/kernel/setup.c | 3 +
arch/arm64/kernel/signal.c | 497 +++++++++++++++++++++++++++++--
arch/arm64/kernel/signal32.c | 2 +-
arch/arm64/kernel/traps.c | 1 +
arch/arm64/kvm/handle_exit.c | 9 +
arch/arm64/mm/proc.S | 27 +-
include/uapi/linux/elf.h | 1 +
25 files changed, 1583 insertions(+), 59 deletions(-)
--
2.1.4
^ permalink raw reply [flat|nested] 30+ messages in thread* [RFC PATCH 16/29] arm64/sve: signal: Add SVE state record to sigcontext 2016-11-25 19:39 [RFC PATCH 00/29] arm64: Scalable Vector Extension core support Dave Martin @ 2016-11-25 19:41 ` Dave Martin 2016-11-25 19:41 ` [RFC PATCH 24/29] arm64/sve: Discard SVE state on system call Dave Martin ` (5 subsequent siblings) 6 siblings, 0 replies; 30+ messages in thread From: Dave Martin @ 2016-11-25 19:41 UTC (permalink / raw) To: linux-arm-kernel; +Cc: Florian Weimer, libc-alpha, gdb This patch adds a record to sigcontext that will contain the SVE state. Subsequent patches will implement the actual register dumping. Signed-off-by: Dave Martin <Dave.Martin@arm.com> --- arch/arm64/include/uapi/asm/sigcontext.h | 86 ++++++++++++++++++++++++++++++++ arch/arm64/kernel/signal.c | 62 +++++++++++++++++++++++ 2 files changed, 148 insertions(+) diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h index 1af8437..11c915d 100644 --- a/arch/arm64/include/uapi/asm/sigcontext.h +++ b/arch/arm64/include/uapi/asm/sigcontext.h @@ -88,4 +88,90 @@ struct extra_context { __u32 size; /* size in bytes of the extra space */ }; +#define SVE_MAGIC 0x53564501 + +struct sve_context { + struct _aarch64_ctx head; + __u16 vl; + __u16 __reserved[3]; +}; + +/* + * The SVE architecture leaves space for future expansion of the + * vector length beyond its initial architectural limit of 2048 bits + * (16 quadwords). + */ +#define SVE_VQ_MIN 1 +#define SVE_VQ_MAX 0x200 + +#define SVE_VL_MIN (SVE_VQ_MIN * 0x10) +#define SVE_VL_MAX (SVE_VQ_MAX * 0x10) + +#define SVE_NUM_ZREGS 32 +#define SVE_NUM_PREGS 16 + +#define sve_vl_valid(vl) \ + ((vl) % 0x10 == 0 && (vl) >= SVE_VL_MIN && (vl) <= SVE_VL_MAX) +#define sve_vq_from_vl(vl) ((vl) / 0x10) + +/* + * The total size of meaningful data in the SVE context in bytes, + * including the header, is given by SVE_SIG_CONTEXT_SIZE(vq). + * + * Note: for all these macros, the "vq" argument denotes the SVE + * vector length in quadwords (i.e., units of 128 bits). + * + * The correct way to obtain vq is to use sve_vq_from_vl(vl). The + * result is valid if and only if sve_vl_valid(vl) is true. This is + * guaranteed for a struct sve_context written by the kernel. + * + * + * Additional macros describe the contents and layout of the payload. + * For each, SVE_SIG_x_OFFSET(args) is the start offset relative to + * the start of struct sve_context, and SVE_SIG_x_SIZE(args) is the + * size in bytes: + * + * x type description + * - ---- ----------- + * REGS the entire SVE context + * + * ZREGS __uint128_t[SVE_NUM_ZREGS][vq] all Z-registers + * ZREG __uint128_t[vq] individual Z-register Zn + * + * PREGS uint16_t[SVE_NUM_PREGS][vq] all P-registers + * PREG uint16_t[vq] individual P-register Pn + * + * FFR uint16_t[vq] first-fault status register + * + * Additional data might be appended in the future. + */ + +#define SVE_SIG_ZREG_SIZE(vq) ((__u32)(vq) * 16) +#define SVE_SIG_PREG_SIZE(vq) ((__u32)(vq) * 2) +#define SVE_SIG_FFR_SIZE(vq) SVE_SIG_PREG_SIZE(vq) + +#define SVE_SIG_REGS_OFFSET ((sizeof(struct sve_context) + 15) / 16 * 16) + +#define SVE_SIG_ZREGS_OFFSET SVE_SIG_REGS_OFFSET +#define SVE_SIG_ZREG_OFFSET(vq, n) \ + (SVE_SIG_ZREGS_OFFSET + SVE_SIG_ZREG_SIZE(vq) * (n)) +#define SVE_SIG_ZREGS_SIZE(vq) \ + (SVE_SIG_ZREG_OFFSET(vq, SVE_NUM_ZREGS) - SVE_SIG_ZREGS_OFFSET) + +#define SVE_SIG_PREGS_OFFSET(vq) \ + (SVE_SIG_ZREGS_OFFSET + SVE_SIG_ZREGS_SIZE(vq)) +#define SVE_SIG_PREG_OFFSET(vq, n) \ + (SVE_SIG_PREGS_OFFSET(vq) + SVE_SIG_PREG_SIZE(vq) * (n)) +#define SVE_SIG_PREGS_SIZE(vq) \ + (SVE_SIG_PREG_OFFSET(vq, SVE_NUM_PREGS) - SVE_SIG_PREGS_OFFSET(vq)) + +#define SVE_SIG_FFR_OFFSET(vq) \ + (SVE_SIG_PREGS_OFFSET(vq) + SVE_SIG_PREGS_SIZE(vq)) + +#define SVE_SIG_REGS_SIZE(vq) \ + (SVE_SIG_FFR_OFFSET(vq) + SVE_SIG_FFR_SIZE(vq) - SVE_SIG_REGS_OFFSET) + +#define SVE_SIG_CONTEXT_SIZE(vq) (SVE_SIG_REGS_OFFSET + SVE_SIG_REGS_SIZE(vq)) + + #endif /* _UAPI__ASM_SIGCONTEXT_H */ diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 1e430b4..7418237 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -57,6 +57,7 @@ struct rt_sigframe_user_layout { unsigned long fpsimd_offset; unsigned long esr_offset; + unsigned long sve_offset; unsigned long extra_offset; unsigned long end_offset; }; @@ -209,8 +210,39 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx) return err ? -EFAULT : 0; } + +#ifdef CONFIG_ARM64_SVE + +static int preserve_sve_context(struct sve_context __user *ctx) +{ + int err = 0; + u16 reserved[ARRAY_SIZE(ctx->__reserved)]; + unsigned int vl = sve_get_vl(); + unsigned int vq = sve_vq_from_vl(vl); + + memset(reserved, 0, sizeof(reserved)); + + __put_user_error(SVE_MAGIC, &ctx->head.magic, err); + __put_user_error(round_up(SVE_SIG_CONTEXT_SIZE(vq), 16), + &ctx->head.size, err); + __put_user_error(vl, &ctx->vl, err); + BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved)); + err |= copy_to_user(&ctx->__reserved, reserved, sizeof(reserved)); + + return err ? -EFAULT : 0; +} + +#else /* ! CONFIG_ARM64_SVE */ + +/* Turn any non-optimised out attempt to use this into a link error: */ +extern int preserve_sve_context(void __user *ctx); + +#endif /* ! CONFIG_ARM64_SVE */ + + struct user_ctxs { struct fpsimd_context __user *fpsimd; + struct sve_context __user *sve; }; static int parse_user_sigframe(struct user_ctxs *user, @@ -224,6 +256,7 @@ static int parse_user_sigframe(struct user_ctxs *user, bool have_extra_context = false; user->fpsimd = NULL; + user->sve = NULL; if (!IS_ALIGNED((unsigned long)base, 16)) goto invalid; @@ -271,6 +304,19 @@ static int parse_user_sigframe(struct user_ctxs *user, /* ignore */ break; + case SVE_MAGIC: + if (!IS_ENABLED(CONFIG_ARM64_SVE)) + goto invalid; + + if (user->sve) + goto invalid; + + if (size < sizeof(*user->sve)) + goto invalid; + + user->sve = (struct sve_context __user *)head; + break; + case EXTRA_MAGIC: if (have_extra_context) goto invalid; @@ -417,6 +463,15 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user) return err; } + if (IS_ENABLED(CONFIG_ARM64_SVE) && (elf_hwcap & HWCAP_SVE)) { + unsigned int vq = sve_vq_from_vl(sve_get_vl()); + + err = sigframe_alloc(user, &user->sve_offset, + SVE_SIG_CONTEXT_SIZE(vq)); + if (err) + return err; + } + return sigframe_alloc_end(user); } @@ -458,6 +513,13 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user, __put_user_error(current->thread.fault_code, &esr_ctx->esr, err); } + /* Scalable Vector Extension state, if present */ + if (IS_ENABLED(CONFIG_ARM64_SVE) && err == 0 && user->sve_offset) { + struct sve_context __user *sve_ctx = + apply_user_offset(user, user->sve_offset); + err |= preserve_sve_context(sve_ctx); + } + if (err == 0 && user->extra_offset) { struct extra_context __user *extra = apply_user_offset(user, user->extra_offset); -- 2.1.4 ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 24/29] arm64/sve: Discard SVE state on system call 2016-11-25 19:39 [RFC PATCH 00/29] arm64: Scalable Vector Extension core support Dave Martin 2016-11-25 19:41 ` [RFC PATCH 16/29] arm64/sve: signal: Add SVE state record to sigcontext Dave Martin @ 2016-11-25 19:41 ` Dave Martin 2016-11-25 19:41 ` [RFC PATCH 18/29] arm64/sve: signal: Restore FPSIMD/SVE state in rt_sigreturn Dave Martin ` (4 subsequent siblings) 6 siblings, 0 replies; 30+ messages in thread From: Dave Martin @ 2016-11-25 19:41 UTC (permalink / raw) To: linux-arm-kernel; +Cc: libc-alpha, gdb The base procedure call standard for the Scalable Vector Extension defines all of the SVE programmer's model state (Z0-31, P0-15, FFR) as caller-save, except for that subset of the state that aliases FPSIMD state. System calls from userspace will almost always be made through C library wrappers -- as a consequence of the PCS there will thus rarely if ever be any live SVE state at syscall entry in practice. This gives us an opportinity to make SVE explicitly caller-save around SVC and so stop carrying around the SVE state for tasks that use SVE only occasionally (say, by calling a library). Note that FPSIMD state will still be preserved around SVC. As a crude heuristic to avoid pathological cases where a thread that uses SVE frequently has to fault back into the kernel again to re-enable SVE after a syscall, we switch the thread back to FPSIMD-only context tracking only if the context is actually switched out before returning to userspace. Signed-off-by: Dave Martin <Dave.Martin@arm.com> --- arch/arm64/kernel/fpsimd.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 5834f81..2e1056e 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -203,6 +203,23 @@ static void task_fpsimd_load(struct task_struct *task) static void task_fpsimd_save(struct task_struct *task) { if (IS_ENABLED(CONFIG_ARM64_SVE) && + task_pt_regs(task)->syscallno != ~0UL && + test_tsk_thread_flag(task, TIF_SVE)) { + unsigned long tmp; + + clear_tsk_thread_flag(task, TIF_SVE); + + /* Trap if the task tries to use SVE again: */ + asm volatile ( + "mrs %[tmp], cpacr_el1\n\t" + "bic %[tmp], %[tmp], %[mask]\n\t" + "msr cpacr_el1, %[tmp]" + : [tmp] "=r" (tmp) + : [mask] "i" (CPACR_EL1_ZEN_EL0EN) + ); + } + + if (IS_ENABLED(CONFIG_ARM64_SVE) && test_tsk_thread_flag(task, TIF_SVE)) sve_save_state(__task_pffr(task), &task->thread.fpsimd_state.fpsr); -- 2.1.4 ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 18/29] arm64/sve: signal: Restore FPSIMD/SVE state in rt_sigreturn 2016-11-25 19:39 [RFC PATCH 00/29] arm64: Scalable Vector Extension core support Dave Martin 2016-11-25 19:41 ` [RFC PATCH 16/29] arm64/sve: signal: Add SVE state record to sigcontext Dave Martin 2016-11-25 19:41 ` [RFC PATCH 24/29] arm64/sve: Discard SVE state on system call Dave Martin @ 2016-11-25 19:41 ` Dave Martin 2016-11-25 19:41 ` [RFC PATCH 17/29] arm64/sve: signal: Dump Scalable Vector Extension registers to user stack Dave Martin ` (3 subsequent siblings) 6 siblings, 0 replies; 30+ messages in thread From: Dave Martin @ 2016-11-25 19:41 UTC (permalink / raw) To: linux-arm-kernel; +Cc: Florian Weimer, libc-alpha, gdb This patch adds the missing logic to restore the SVE state in rt_sigreturn. Because the FPSIMD and SVE state alias, this code replaces the existing fpsimd restore code when there is SVE state to restore. For Zn[127:0], the saved FPSIMD state in Vn takes precedence. Since __task_fpsimd_to_sve() is used to merge the FPSIMD and SVE state back together, and only for this purpose, we don't want it to zero out the SVE state -- hence delete the memset() from there. Signed-off-by: Dave Martin <Dave.Martin@arm.com> --- arch/arm64/kernel/fpsimd.c | 4 --- arch/arm64/kernel/signal.c | 87 ++++++++++++++++++++++++++++++++++++++++------ 2 files changed, 76 insertions(+), 15 deletions(-) diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 4ef2e37..b1a8d3e 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -266,9 +266,6 @@ static void task_sve_to_fpsimd(struct task_struct *task __always_unused) { } void fpsimd_signal_preserve_current_state(void) { - WARN_ONCE(elf_hwcap & HWCAP_SVE, - "SVE state save/restore around signals doesn't work properly, expect userspace corruption!\n"); - fpsimd_preserve_current_state(); task_sve_to_fpsimd(current); } @@ -301,7 +298,6 @@ static void __task_fpsimd_to_sve(struct task_struct *task, unsigned int vq) struct fpsimd_state *fst = &task->thread.fpsimd_state; unsigned int i; - memset(sst, 0, sizeof(*sst)); for (i = 0; i < 32; ++i) sst->zregs[i][0] = fst->vregs[i]; } diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 038e7338..2697d09 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -211,6 +211,11 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx) } +struct user_ctxs { + struct fpsimd_context __user *fpsimd; + struct sve_context __user *sve; +}; + #ifdef CONFIG_ARM64_SVE static int preserve_sve_context(struct sve_context __user *ctx) @@ -240,19 +245,68 @@ static int preserve_sve_context(struct sve_context __user *ctx) return err ? -EFAULT : 0; } +static int __restore_sve_fpsimd_context(struct user_ctxs *user, + unsigned int vl, unsigned int vq) +{ + int err; + struct fpsimd_sve_state(vq) *task_sve_regs = + __task_sve_state(current); + struct fpsimd_state fpsimd; + + if (vl != sve_get_vl()) + return -EINVAL; + + BUG_ON(SVE_SIG_REGS_SIZE(vq) > sizeof(*task_sve_regs)); + BUG_ON(round_up(SVE_SIG_REGS_SIZE(vq), 16) < sizeof(*task_sve_regs)); + BUG_ON(SVE_SIG_FFR_OFFSET(vq) - SVE_SIG_REGS_OFFSET != + (char *)&task_sve_regs->ffr - (char *)task_sve_regs); + err = __copy_from_user(task_sve_regs, + (char __user const *)user->sve + + SVE_SIG_REGS_OFFSET, + SVE_SIG_REGS_SIZE(vq)); + if (err) + return err; + + /* copy the FP and status/control registers */ + /* restore_sigframe() already checked that user->fpsimd != NULL. */ + err = __copy_from_user(fpsimd.vregs, user->fpsimd->vregs, + sizeof(fpsimd.vregs)); + __get_user_error(fpsimd.fpsr, &user->fpsimd->fpsr, err); + __get_user_error(fpsimd.fpcr, &user->fpsimd->fpcr, err); + + /* load the hardware registers from the fpsimd_state structure */ + if (!err) + fpsimd_update_current_state(&fpsimd); + + return err; +} + +static int restore_sve_fpsimd_context(struct user_ctxs *user) +{ + int err; + u16 vl, vq; + + err = __get_user(vl, &user->sve->vl); + if (err) + return err; + + if (!sve_vl_valid(vl)) + return -EINVAL; + + vq = sve_vq_from_vl(vl); + + return __restore_sve_fpsimd_context(user, vl, vq); +} + #else /* ! CONFIG_ARM64_SVE */ -/* Turn any non-optimised out attempt to use this into a link error: */ +/* Turn any non-optimised out attempts to use these into a link error: */ extern int preserve_sve_context(void __user *ctx); +extern int restore_sve_fpsimd_context(struct user_ctxs *user); #endif /* ! CONFIG_ARM64_SVE */ -struct user_ctxs { - struct fpsimd_context __user *fpsimd; - struct sve_context __user *sve; -}; - static int parse_user_sigframe(struct user_ctxs *user, struct rt_sigframe __user *sf) { @@ -316,6 +370,9 @@ static int parse_user_sigframe(struct user_ctxs *user, if (!IS_ENABLED(CONFIG_ARM64_SVE)) goto invalid; + if (!(elf_hwcap & HWCAP_SVE)) + goto invalid; + if (user->sve) goto invalid; @@ -375,9 +432,6 @@ static int parse_user_sigframe(struct user_ctxs *user, } done: - if (!user->fpsimd) - goto invalid; - return 0; invalid: @@ -411,8 +465,19 @@ static int restore_sigframe(struct pt_regs *regs, if (err == 0) err = parse_user_sigframe(&user, sf); - if (err == 0) - err = restore_fpsimd_context(user.fpsimd); + if (err == 0) { + if (!user.fpsimd) + return -EINVAL; + + if (user.sve) { + if (!IS_ENABLED(CONFIG_ARM64_SVE) || + !(elf_hwcap & HWCAP_SVE)) + return -EINVAL; + + err = restore_sve_fpsimd_context(&user); + } else + err = restore_fpsimd_context(user.fpsimd); + } return err; } -- 2.1.4 ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 17/29] arm64/sve: signal: Dump Scalable Vector Extension registers to user stack 2016-11-25 19:39 [RFC PATCH 00/29] arm64: Scalable Vector Extension core support Dave Martin ` (2 preceding siblings ...) 2016-11-25 19:41 ` [RFC PATCH 18/29] arm64/sve: signal: Restore FPSIMD/SVE state in rt_sigreturn Dave Martin @ 2016-11-25 19:41 ` Dave Martin 2016-11-25 19:42 ` [RFC PATCH 27/29] arm64/sve: ptrace support Dave Martin ` (2 subsequent siblings) 6 siblings, 0 replies; 30+ messages in thread From: Dave Martin @ 2016-11-25 19:41 UTC (permalink / raw) To: linux-arm-kernel; +Cc: Florian Weimer, libc-alpha, gdb This patch populates the sve_regs() area reserved on the user stack with the actual register context. Signed-off-by: Dave Martin <Dave.Martin@arm.com> --- arch/arm64/include/asm/fpsimd.h | 1 + arch/arm64/kernel/fpsimd.c | 5 ++--- arch/arm64/kernel/signal.c | 8 ++++++++ 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index aa82b38..e39066a 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -93,6 +93,7 @@ extern void fpsimd_load_partial_state(struct fpsimd_partial_state *state); extern void __init fpsimd_init_task_struct_size(void); +extern void *__task_sve_state(struct task_struct *task); extern void sve_save_state(void *state, u32 *pfpsr); extern void sve_load_state(void const *state, u32 const *pfpsr); extern unsigned int sve_get_vl(void); diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 9a90921..4ef2e37 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -128,7 +128,7 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs) #ifdef CONFIG_ARM64_SVE -static void *__task_sve_state(struct task_struct *task) +void *__task_sve_state(struct task_struct *task) { return (char *)task + ALIGN(sizeof(*task), 16); } @@ -143,8 +143,7 @@ static void *__task_pffr(struct task_struct *task) #else /* !CONFIG_ARM64_SVE */ -/* Turn any non-optimised out attempts to use these into a link error: */ -extern void *__task_sve_state(struct task_struct *task); +/* Turn any non-optimised out attempts to use this into a link error: */ extern void *__task_pffr(struct task_struct *task); #endif /* !CONFIG_ARM64_SVE */ diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 7418237..038e7338 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -229,6 +229,14 @@ static int preserve_sve_context(struct sve_context __user *ctx) BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved)); err |= copy_to_user(&ctx->__reserved, reserved, sizeof(reserved)); + /* + * This assumes that the SVE state has already been saved to + * the task struct by calling preserve_fpsimd_context(). + */ + err |= copy_to_user((char __user *)ctx + SVE_SIG_REGS_OFFSET, + __task_sve_state(current), + SVE_SIG_REGS_SIZE(vq)); + return err ? -EFAULT : 0; } -- 2.1.4 ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 27/29] arm64/sve: ptrace support 2016-11-25 19:39 [RFC PATCH 00/29] arm64: Scalable Vector Extension core support Dave Martin ` (3 preceding siblings ...) 2016-11-25 19:41 ` [RFC PATCH 17/29] arm64/sve: signal: Dump Scalable Vector Extension registers to user stack Dave Martin @ 2016-11-25 19:42 ` Dave Martin 2016-11-30 9:56 ` [RFC PATCH 00/29] arm64: Scalable Vector Extension core support Yao Qi 2016-11-30 10:08 ` Florian Weimer 6 siblings, 0 replies; 30+ messages in thread From: Dave Martin @ 2016-11-25 19:42 UTC (permalink / raw) To: linux-arm-kernel; +Cc: Alan Hayward, gdb From: Alan Hayward <alan.hayward@arm.com> This patch adds support for accessing a task's SVE registers via ptrace. Some additional helpers are added in order to support the SVE/ FPSIMD register view synchronisation operations that are required in order to make the NT_PRFPREG and NT_ARM_SVE regsets interact correctly. fpr_set()/fpr_get() are refactored into backend/frontend functions, so that the core can be reused by sve_set()/sve_get() for the case where no SVE registers are stored for a thread. Signed-off-by: Alan Hayward <alan.hayward@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> --- arch/arm64/include/asm/fpsimd.h | 20 +++ arch/arm64/include/uapi/asm/ptrace.h | 125 +++++++++++++++ arch/arm64/include/uapi/asm/sigcontext.h | 4 + arch/arm64/kernel/fpsimd.c | 42 +++++ arch/arm64/kernel/ptrace.c | 254 ++++++++++++++++++++++++++++++- include/uapi/linux/elf.h | 1 + 6 files changed, 440 insertions(+), 6 deletions(-) diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index e39066a..88bcf69 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -35,6 +35,10 @@ struct fpsimd_state { __uint128_t vregs[32]; u32 fpsr; u32 fpcr; + /* + * For ptrace compatibility, pad to next 128-bit + * boundary here if extending this struct. + */ }; }; /* the id of the last cpu to have restored this state */ @@ -98,6 +102,22 @@ extern void sve_save_state(void *state, u32 *pfpsr); extern void sve_load_state(void const *state, u32 const *pfpsr); extern unsigned int sve_get_vl(void); +/* + * FPSIMD/SVE synchronisation helpers for ptrace: + * For use on stopped tasks only + */ + +extern void fpsimd_sync_to_sve(struct task_struct *task); + +#ifdef CONFIG_ARM64_SVE +extern void fpsimd_sync_to_fpsimd(struct task_struct *task); +extern void fpsimd_sync_from_fpsimd_zeropad(struct task_struct *task); +#else /* !CONFIG_ARM64_SVE */ +static void __maybe_unused fpsimd_sync_to_fpsimd(struct task_struct *task) { } +static void __maybe_unused fpsimd_sync_from_fpsimd_zeropad( + struct task_struct *task) { } +#endif /* !CONFIG_ARM64_SVE */ + #endif #endif diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h index b5c3933..48b57a0 100644 --- a/arch/arm64/include/uapi/asm/ptrace.h +++ b/arch/arm64/include/uapi/asm/ptrace.h @@ -22,6 +22,7 @@ #include <linux/types.h> #include <asm/hwcap.h> +#include <asm/sigcontext.h> /* @@ -77,6 +78,7 @@ struct user_fpsimd_state { __uint128_t vregs[32]; __u32 fpsr; __u32 fpcr; + /* Pad to next 128-bit boundary here if extending this struct */ }; struct user_hwdebug_state { @@ -89,6 +91,129 @@ struct user_hwdebug_state { } dbg_regs[16]; }; +/* SVE/FP/SIMD state (NT_ARM_SVE) */ + +struct user_sve_header { + __u32 size; /* total meaningful regset content in bytes */ + __u32 max_size; /* maxmium possible size for this thread */ + __u16 vl; /* current vector length */ + __u16 max_vl; /* maximum possible vector length */ + __u16 flags; + __u16 __reserved; +}; + +/* Definitions for user_sve_header.flags: */ +#define SVE_PT_REGS_MASK (1 << 0) + +#define SVE_PT_REGS_FPSIMD 0 +#define SVE_PT_REGS_SVE SVE_PT_REGS_MASK + + +/* + * The remainder of the SVE state follows struct user_sve_header. The + * total size of the SVE state (including header) depends on the + * metadata in the header: SVE_PT_SIZE(vq, flags) gives the total size + * of the state in bytes, including the header. + * + * Refer to <asm/sigcontext.h> for details of how to pass the correct + * "vq" argument to these macros. + */ + +/* Offset from the start of struct user_sve_header to the register data */ +#define SVE_PT_REGS_OFFSET ((sizeof(struct sve_context) + 15) / 16 * 16) + +/* + * The register data content and layout depends on the value of the + * flags field. + */ + +/* + * (flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD case: + * + * The payload starts at offset SVE_PT_FPSIMD_OFFSET, and is of type + * struct user_fpsimd_state. Additional data might be appended in the + * future: use SVE_PT_FPSIMD_SIZE(vq, flags) to compute the total size. + * SVE_PT_FPSIMD_SIZE(vq, flags) will never be less than + * sizeof(struct user_fpsimd_state). + */ + +#define SVE_PT_FPSIMD_OFFSET SVE_PT_REGS_OFFSET + +#define SVE_PT_FPSIMD_SIZE(vq, flags) (sizeof(struct user_fpsimd_state)) + +/* + * (flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE case: + * + * The payload starts at offset SVE_PT_SVE_OFFSET, and is of size + * SVE_PT_SVE_SIZE(vq, flags). + * + * Additional macros describe the contents and layout of the payload. + * For each, SVE_PT_SVE_x_OFFSET(args) is the start offset relative to + * the start of struct user_sve_header, and SVE_PT_SVE_x_SIZE(args) is + * the size in bytes: + * + * x type description + * - ---- ----------- + * ZREGS \ + * ZREG | + * PREGS | refer to <asm/sigcontext.h> + * PREG | + * FFR / + * + * FPSR uint32_t FPSR + * FPCR uint32_t FPCR + * + * Additional data might be appended in the future. + */ + +#define SVE_PT_SVE_ZREG_SIZE(vq) SVE_SIG_ZREG_SIZE(vq) +#define SVE_PT_SVE_PREG_SIZE(vq) SVE_SIG_PREG_SIZE(vq) +#define SVE_PT_SVE_FFR_SIZE(vq) SVE_SIG_FFR_SIZE(vq) +#define SVE_PT_SVE_FPSR_SIZE sizeof(__u32) +#define SVE_PT_SVE_FPCR_SIZE sizeof(__u32) + +#define __SVE_SIG_TO_PT(offset) \ + ((offset) - SVE_SIG_REGS_OFFSET + SVE_PT_REGS_OFFSET) + +#define SVE_PT_SVE_OFFSET SVE_PT_REGS_OFFSET + +#define SVE_PT_SVE_ZREGS_OFFSET \ + __SVE_SIG_TO_PT(SVE_SIG_ZREGS_OFFSET) +#define SVE_PT_SVE_ZREG_OFFSET(vq, n) \ + __SVE_SIG_TO_PT(SVE_SIG_ZREG_OFFSET(vq, n)) +#define SVE_PT_SVE_ZREGS_SIZE(vq) \ + (SVE_PT_SVE_ZREG_OFFSET(vq, SVE_NUM_ZREGS) - SVE_PT_SVE_ZREGS_OFFSET) + +#define SVE_PT_SVE_PREGS_OFFSET(vq) \ + __SVE_SIG_TO_PT(SVE_SIG_PREGS_OFFSET(vq)) +#define SVE_PT_SVE_PREG_OFFSET(vq, n) \ + __SVE_SIG_TO_PT(SVE_SIG_PREG_OFFSET(vq, n)) +#define SVE_PT_SVE_PREGS_SIZE(vq) \ + (SVE_PT_SVE_PREG_OFFSET(vq, SVE_NUM_PREGS) - \ + SVE_PT_SVE_PREGS_OFFSET(vq)) + +#define SVE_PT_SVE_FFR_OFFSET(vq) \ + __SVE_SIG_TO_PT(SVE_SIG_FFR_OFFSET(vq)) + +#define SVE_PT_SVE_FPSR_OFFSET(vq) \ + ((SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq) + 15) / 16 * 16) +#define SVE_PT_SVE_FPCR_OFFSET(vq) \ + (SVE_PT_SVE_FPSR_OFFSET(vq) + SVE_PT_SVE_FPSR_SIZE) + +/* + * Any future extension appended after FPCR must be aligned to the next + * 128-bit boundary. + */ + +#define SVE_PT_SVE_SIZE(vq, flags) \ + ((SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE - \ + SVE_PT_SVE_OFFSET + 15) / 16 * 16) + +#define SVE_PT_SIZE(vq, flags) \ + (((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE ? \ + SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, flags) \ + : SVE_PT_FPSIMD_OFFSET + SVE_PT_FPSIMD_SIZE(vq, flags)) + #endif /* __ASSEMBLY__ */ #endif /* _UAPI__ASM_PTRACE_H */ diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h index 11c915d..91e55de 100644 --- a/arch/arm64/include/uapi/asm/sigcontext.h +++ b/arch/arm64/include/uapi/asm/sigcontext.h @@ -16,6 +16,8 @@ #ifndef _UAPI__ASM_SIGCONTEXT_H #define _UAPI__ASM_SIGCONTEXT_H +#ifndef __ASSEMBLY__ + #include <linux/types.h> /* @@ -96,6 +98,8 @@ struct sve_context { __u16 __reserved[3]; }; +#endif /* !__ASSEMBLY__ */ + /* * The SVE architecture leaves space for future expansion of the * vector length beyond its initial architectural limit of 2048 bits diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 1750301..6a5e725 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -417,6 +417,48 @@ void fpsimd_flush_task_state(struct task_struct *t) t->thread.fpsimd_state.cpu = NR_CPUS; } +#ifdef CONFIG_ARM64_SVE + +/* FPSIMD/SVE synchronisation helpers for ptrace */ + +void fpsimd_sync_to_sve(struct task_struct *task) +{ + if (!test_tsk_thread_flag(task, TIF_SVE)) + task_fpsimd_to_sve(task); +} + +void fpsimd_sync_to_fpsimd(struct task_struct *task) +{ + if (test_tsk_thread_flag(task, TIF_SVE)) + task_sve_to_fpsimd(task); +} + +static void __fpsimd_sync_from_fpsimd_zeropad(struct task_struct *task, + unsigned int vq) +{ + struct sve_struct fpsimd_sve_state(vq) *sst = + __task_sve_state(task); + struct fpsimd_state *fst = &task->thread.fpsimd_state; + unsigned int i; + + if (!test_tsk_thread_flag(task, TIF_SVE)) + return; + + memset(sst->zregs, 0, sizeof(sst->zregs)); + + for (i = 0; i < 32; ++i) + sst->zregs[i][0] = fst->vregs[i]; +} + +void fpsimd_sync_from_fpsimd_zeropad(struct task_struct *task) +{ + unsigned int vl = sve_get_vl(); + + __fpsimd_sync_from_fpsimd_zeropad(task, sve_vq_from_vl(vl)); +} + +#endif /* CONFIG_ARM64_SVE */ + #ifdef CONFIG_KERNEL_MODE_NEON static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate); diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index e0c81da..bdd2ad3 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -30,7 +30,9 @@ #include <linux/seccomp.h> #include <linux/security.h> #include <linux/init.h> +#include <linux/sched.h> #include <linux/signal.h> +#include <linux/string.h> #include <linux/uaccess.h> #include <linux/perf_event.h> #include <linux/hw_breakpoint.h> @@ -611,13 +613,46 @@ static int gpr_set(struct task_struct *target, const struct user_regset *regset, /* * TODO: update fp accessors for lazy context switching (sync/flush hwstate) */ +static int __fpr_get(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + void *kbuf, void __user *ubuf, unsigned int start_pos) +{ + struct user_fpsimd_state *uregs; + + fpsimd_sync_to_fpsimd(target); + + uregs = &target->thread.fpsimd_state.user_fpsimd; + return user_regset_copyout(&pos, &count, &kbuf, &ubuf, uregs, + start_pos, start_pos + sizeof(*uregs)); +} + static int fpr_get(struct task_struct *target, const struct user_regset *regset, unsigned int pos, unsigned int count, void *kbuf, void __user *ubuf) { - struct user_fpsimd_state *uregs; - uregs = &target->thread.fpsimd_state.user_fpsimd; - return user_regset_copyout(&pos, &count, &kbuf, &ubuf, uregs, 0, -1); + return __fpr_get(target, regset, pos, count, kbuf, ubuf, 0); +} + +static int __fpr_set(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf, + unsigned int start_pos) +{ + int ret; + struct user_fpsimd_state newstate; + + fpsimd_sync_to_fpsimd(target); + + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate, + start_pos, start_pos + sizeof(newstate)); + if (ret) + return ret; + + target->thread.fpsimd_state.user_fpsimd = newstate; + + return ret; } static int fpr_set(struct task_struct *target, const struct user_regset *regset, @@ -625,14 +660,14 @@ static int fpr_set(struct task_struct *target, const struct user_regset *regset, const void *kbuf, const void __user *ubuf) { int ret; - struct user_fpsimd_state newstate; - ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate, 0, -1); + ret = __fpr_set(target, regset, pos, count, kbuf, ubuf, 0); if (ret) return ret; - target->thread.fpsimd_state.user_fpsimd = newstate; + fpsimd_sync_from_fpsimd_zeropad(target); fpsimd_flush_task_state(target); + return ret; } @@ -685,6 +720,204 @@ static int system_call_set(struct task_struct *target, return ret; } +#ifdef CONFIG_ARM64_SVE + +static int sve_get(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + void *kbuf, void __user *ubuf) +{ + int ret; + struct user_sve_header header; + unsigned int vq; + unsigned long start, end; + + /* Header */ + memset(&header, 0, sizeof(header)); + + header.vl = sve_get_vl(); + + BUG_ON(!sve_vl_valid(header.vl)); + vq = sve_vq_from_vl(header.vl); + + /* Until runtime or per-task vector length changing is supported: */ + header.max_vl = header.vl; + + header.flags = test_tsk_thread_flag(target, TIF_SVE) ? + SVE_PT_REGS_SVE : SVE_PT_REGS_FPSIMD; + + header.size = SVE_PT_SIZE(vq, header.flags); + header.max_size = SVE_PT_SIZE(vq, SVE_PT_REGS_SVE); + + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &header, + 0, sizeof(header)); + if (ret) + return ret; + + /* Registers: FPSIMD-only case */ + + BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header)); + + if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD) + return __fpr_get(target, regset, pos, count, kbuf, ubuf, + SVE_PT_FPSIMD_OFFSET); + + /* Otherwise: full SVE case */ + + BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header)); + + start = SVE_PT_SVE_OFFSET; + end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq); + + BUG_ON((char *)__task_sve_state(target) < (char *)target); + BUG_ON(end < start); + BUG_ON(arch_task_struct_size < end - start); + BUG_ON((char *)__task_sve_state(target) - (char *)target > + arch_task_struct_size - (end - start)); + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, + __task_sve_state(target), + start, end); + if (ret) + return ret; + + start = end; + end = SVE_PT_SVE_FPSR_OFFSET(vq); + + BUG_ON(end < start); + ret = user_regset_copyout_zero(&pos, &count, &kbuf, &ubuf, + start, end); + if (ret) + return ret; + + start = end; + end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE; + + BUG_ON((char *)(&target->thread.fpsimd_state.fpcr + 1) < + (char *)&target->thread.fpsimd_state.fpsr); + BUG_ON(end < start); + BUG_ON((char *)(&target->thread.fpsimd_state.fpcr + 1) - + (char *)&target->thread.fpsimd_state.fpsr != + end - start); + + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, + &target->thread.fpsimd_state.fpsr, + start, end); + if (ret) + return ret; + + start = end; + end = (SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE) + 15) / 16 * 16; + BUG_ON(end < start); + + return user_regset_copyout_zero(&pos, &count, &kbuf, &ubuf, + start, end); +} + +static int sve_set(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + int ret; + struct user_sve_header header; + unsigned int vq; + unsigned long start, end; + + /* Header */ + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &header, + 0, sizeof(header)); + if (ret) + goto out; + + if (header.vl != sve_get_vl()) + return -EINVAL; + + BUG_ON(!sve_vl_valid(header.vl)); + vq = sve_vq_from_vl(header.vl); + + if (header.flags & ~SVE_PT_REGS_MASK) + return -EINVAL; + + /* Registers: FPSIMD-only case */ + + BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header)); + + if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD) { + ret = __fpr_set(target, regset, pos, count, kbuf, ubuf, + SVE_PT_FPSIMD_OFFSET); + clear_tsk_thread_flag(target, TIF_SVE); + goto out; + } + + /* Otherwise: full SVE case */ + + fpsimd_sync_to_sve(target); + set_tsk_thread_flag(target, TIF_SVE); + + BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header)); + + start = SVE_PT_SVE_OFFSET; + end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq); + + BUG_ON((char *)__task_sve_state(target) < (char *)target); + BUG_ON(end < start); + BUG_ON(arch_task_struct_size < end - start); + BUG_ON((char *)__task_sve_state(target) - (char *)target > + arch_task_struct_size - (end - start)); + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, + __task_sve_state(target), + start, end); + if (ret) + goto out; + + start = end; + end = SVE_PT_SVE_FPSR_OFFSET(vq); + + BUG_ON(end < start); + ret = user_regset_copyin_ignore(&pos, &count, &kbuf, &ubuf, + start, end); + if (ret) + goto out; + + start = end; + end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE; + + BUG_ON((char *)(&target->thread.fpsimd_state.fpcr + 1) < + (char *)&target->thread.fpsimd_state.fpsr); + BUG_ON(end < start); + BUG_ON((char *)(&target->thread.fpsimd_state.fpcr + 1) - + (char *)&target->thread.fpsimd_state.fpsr != + end - start); + + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, + &target->thread.fpsimd_state.fpsr, + start, end); + +out: + fpsimd_flush_task_state(target); + return ret; +} + +#else /* !CONFIG_ARM64_SVE */ + +static int sve_get(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + void *kbuf, void __user *ubuf) +{ + return -EINVAL; +} + +static int sve_set(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + return -EINVAL; +} + +#endif /* !CONFIG_ARM64_SVE */ + enum aarch64_regset { REGSET_GPR, REGSET_FPR, @@ -694,6 +927,7 @@ enum aarch64_regset { REGSET_HW_WATCH, #endif REGSET_SYSTEM_CALL, + REGSET_SVE, }; static const struct user_regset aarch64_regsets[] = { @@ -751,6 +985,14 @@ static const struct user_regset aarch64_regsets[] = { .get = system_call_get, .set = system_call_set, }, + [REGSET_SVE] = { /* Scalable Vector Extension */ + .core_note_type = NT_ARM_SVE, + .n = (SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE) + 15) / 16, + .size = 16, + .align = 16, + .get = sve_get, + .set = sve_set, + }, }; static const struct user_regset_view user_aarch64_view = { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index b59ee07..23c6585 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -414,6 +414,7 @@ typedef struct elf64_shdr { #define NT_ARM_HW_BREAK 0x402 /* ARM hardware breakpoint registers */ #define NT_ARM_HW_WATCH 0x403 /* ARM hardware watchpoint registers */ #define NT_ARM_SYSTEM_CALL 0x404 /* ARM system call number */ +#define NT_ARM_SVE 0x405 /* ARM Scalable Vector Extension registers */ #define NT_METAG_CBUF 0x500 /* Metag catch buffer registers */ #define NT_METAG_RPIPE 0x501 /* Metag read pipeline state */ #define NT_METAG_TLS 0x502 /* Metag TLS pointer */ -- 2.1.4 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-11-25 19:39 [RFC PATCH 00/29] arm64: Scalable Vector Extension core support Dave Martin ` (4 preceding siblings ...) 2016-11-25 19:42 ` [RFC PATCH 27/29] arm64/sve: ptrace support Dave Martin @ 2016-11-30 9:56 ` Yao Qi 2016-11-30 12:07 ` Dave Martin 2016-11-30 10:08 ` Florian Weimer 6 siblings, 1 reply; 30+ messages in thread From: Yao Qi @ 2016-11-30 9:56 UTC (permalink / raw) To: Dave Martin Cc: linux-arm-kernel, Christoffer Dall, Florian Weimer, Ard Biesheuvel, Marc Zyngier, Alan Hayward, libc-alpha, GDB Hi, Dave, On Fri, Nov 25, 2016 at 7:38 PM, Dave Martin <Dave.Martin@arm.com> wrote: > * No independent SVE vector length configuration per thread. This is > planned, but will follow as a separate add-on series. If I read "independent SVE vector length configuration per thread" correctly, SVE vector length can be different in each thread, so the size of vector registers is different too. In GDB, we describe registers by "target description" which is per process, not per thread. -- Yao (齐尧) ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-11-30 9:56 ` [RFC PATCH 00/29] arm64: Scalable Vector Extension core support Yao Qi @ 2016-11-30 12:07 ` Dave Martin 2016-11-30 12:22 ` Szabolcs Nagy ` (3 more replies) 0 siblings, 4 replies; 30+ messages in thread From: Dave Martin @ 2016-11-30 12:07 UTC (permalink / raw) To: Florian Weimer, Yao Qi Cc: linux-arm-kernel, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Alan Hayward, Torvald Riegel, Christoffer Dall On Wed, Nov 30, 2016 at 11:08:50AM +0100, Florian Weimer wrote: > On 11/25/2016 08:38 PM, Dave Martin wrote: > >The Scalable Vector Extension (SVE) [1] is an extension to AArch64 which > >adds extra SIMD functionality and supports much larger vectors. > > > >This series implements core Linux support for SVE. > > > >Recipents not copied on the whole series can find the rest of the > >patches in the linux-arm-kernel archives [2]. > > > > > >The first 5 patches "arm64: signal: ..." factor out the allocation and > >placement of state information in the signal frame. The first three > >are prerequisites for the SVE support patches. > > > >Patches 04-05 implement expansion of the signal frame, and may remain > >controversial due to ABI break issues: > > > > * Discussion is needed on how userspace should detect/negotiate signal > > frame size in order for this expansion mechanism to be workable. > > I'm leaning towards a simple increase in the glibc headers (despite the ABI > risk), plus a personality flag to disable really wide vector registers in > case this causes problems with old binaries. I'm concerned here that there may be no sensible fixed size for the signal frame. We would make it ridiculously large in order to minimise the chance of hitting this problem again -- but then it would be ridiculously large, which is a potential problem for massively threaded workloads. Or we could be more conservative, but risk a re-run of similar ABI breaks in the future. A personality flag may also discourage use of larger vectors, even though the vast majority of software will work fine with them. > A more elaborate mechanism will likely introduce more bugs than it makes > existing applications working, due to its complexity. Yes, I was a bit concerned about this when I tried to sketch something out. [...] > > * No independent SVE vector length configuration per thread. This is > > planned, but will follow as a separate add-on series. > > Per-thread register widths will likely make coroutine switching (setcontext) > and C++ resumable functions/executors quite challenging. > > Can you detail your plans in this area? > > Thanks, > Florian I'll also respond to Yao's question here, since it's closely related: On Wed, Nov 30, 2016 at 09:56:14AM +0000, Yao Qi wrote: [...] > If I read "independent SVE vector length configuration per thread" > correctly, SVE vector length can be different in each thread, so the > size of vector registers is different too. In GDB, we describe registers > by "target description" which is per process, not per thread. > > -- > Yao (é½å°§) So, my key goal is to support _per-process_ vector length control. From the kernel perspective, it is easiest to achieve this by providing per-thread control since that is the unit that context switching acts on. How useful it really is to have threads with different VLs in the same process is an open question. It's theoretically useful for runtime environments, which may want to dispatch code optimised for different VLs -- changing the VL on-the-fly within a single thread is not something I want to encourage, due to overhead and ABI issues, but switching between threads of different VLs would be more manageable. However, I expect mixing different VLs within a single process to be very much a special case -- it's not something I'd expect to work with general-purpose code. Since the need for indepent VLs per thread is not proven, we could * forbid it -- i.e., only a thread-group leader with no children is permitted to change the VL, which is then inherited by any child threads that are subsequently created * permit it only if a special flag is specified when requesting the VL change * permit it and rely on userspace to be sensible -- easiest option for the kernel. For setcontext/setjmp, we don't save/restore any SVE state due to the caller-save status of SVE, and I would not consider it necessary to save/restore VL itself because of the no-change-on-the-fly policy for this. I'm not familiar with resumable functions/executors -- are these in the C++ standards yet (not that that would cause me to be familiar with them... ;) Any implementation of coroutines (i.e., cooperative switching) is likely to fall under the "setcontext" argument above. Thoughts? ---Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-11-30 12:07 ` Dave Martin @ 2016-11-30 12:22 ` Szabolcs Nagy 2016-11-30 14:10 ` Dave Martin 2016-11-30 12:38 ` Florian Weimer ` (2 subsequent siblings) 3 siblings, 1 reply; 30+ messages in thread From: Szabolcs Nagy @ 2016-11-30 12:22 UTC (permalink / raw) To: Dave Martin, Florian Weimer, Yao Qi Cc: nd, linux-arm-kernel, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Alan Hayward, Torvald Riegel, Christoffer Dall On 30/11/16 12:06, Dave Martin wrote: > For setcontext/setjmp, we don't save/restore any SVE state due to the > caller-save status of SVE, and I would not consider it necessary to > save/restore VL itself because of the no-change-on-the-fly policy for > this. the problem is not changing VL within a thread, but that setcontext can resume a context of a different thread which had different VL and there might be SVE regs spilled on the stack according to that. (i consider this usage undefined, but at least the gccgo runtime does this) ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-11-30 12:22 ` Szabolcs Nagy @ 2016-11-30 14:10 ` Dave Martin 0 siblings, 0 replies; 30+ messages in thread From: Dave Martin @ 2016-11-30 14:10 UTC (permalink / raw) To: Szabolcs Nagy Cc: Florian Weimer, Yao Qi, Torvald Riegel, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Christoffer Dall, Alan Hayward, nd, linux-arm-kernel On Wed, Nov 30, 2016 at 12:22:32PM +0000, Szabolcs Nagy wrote: > On 30/11/16 12:06, Dave Martin wrote: > > For setcontext/setjmp, we don't save/restore any SVE state due to the > > caller-save status of SVE, and I would not consider it necessary to > > save/restore VL itself because of the no-change-on-the-fly policy for > > this. > > the problem is not changing VL within a thread, > but that setcontext can resume a context of a > different thread which had different VL and there > might be SVE regs spilled on the stack according > to that. > > (i consider this usage undefined, but at least > the gccgo runtime does this) Understood -- which is part of the reason for the argument that although the kernel may permit different threads to have different VLs, whether this actually works usefully also depends on your userspace runtime environment. This again leads me to the conclusion that the request to create threads with different VLs within a single process should be explicit, in order to avoid accidents. Cheers ---Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-11-30 12:07 ` Dave Martin 2016-11-30 12:22 ` Szabolcs Nagy @ 2016-11-30 12:38 ` Florian Weimer 2016-11-30 13:56 ` Dave Martin 2016-12-02 11:49 ` Dave Martin 2016-12-02 21:56 ` Yao Qi 2016-12-05 22:42 ` Torvald Riegel 3 siblings, 2 replies; 30+ messages in thread From: Florian Weimer @ 2016-11-30 12:38 UTC (permalink / raw) To: Dave Martin, Yao Qi Cc: linux-arm-kernel, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Alan Hayward, Torvald Riegel, Christoffer Dall On 11/30/2016 01:06 PM, Dave Martin wrote: > I'm concerned here that there may be no sensible fixed size for the > signal frame. We would make it ridiculously large in order to minimise > the chance of hitting this problem again -- but then it would be > ridiculously large, which is a potential problem for massively threaded > workloads. What's ridiculously large? We could add a system call to get the right stack size. But as it depends on VL, I'm not sure what it looks like. Particularly if you need determine the stack size before creating a thread that uses a specific VL setting. > For setcontext/setjmp, we don't save/restore any SVE state due to the > caller-save status of SVE, and I would not consider it necessary to > save/restore VL itself because of the no-change-on-the-fly policy for > this. Okay, so we'd potentially set it on thread creation only? That might not be too bad. I really want to avoid a repeat of the setxid fiasco, where we need to run code on all threads to get something that approximates the POSIX-mandated behavior (process attribute) from what the kernel provides (thread/task attribute). > I'm not familiar with resumable functions/executors -- are these in > the C++ standards yet (not that that would cause me to be familiar > with them... ;) Any implementation of coroutines (i.e., > cooperative switching) is likely to fall under the "setcontext" > argument above. There are different ways to implement coroutines. Stack switching (like setcontext) is obviously impacted by non-uniform register sizes. But even the most conservative variant, rather similar to switch-based emulation you sometimes see in C coroutine implementations, might have trouble restoring the state if it just cannot restore the saved state due to register size reductions. Thanks, Florian ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-11-30 12:38 ` Florian Weimer @ 2016-11-30 13:56 ` Dave Martin 2016-12-01 9:21 ` Florian Weimer 2016-12-02 11:49 ` Dave Martin 1 sibling, 1 reply; 30+ messages in thread From: Dave Martin @ 2016-11-30 13:56 UTC (permalink / raw) To: Florian Weimer Cc: Yao Qi, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Christoffer Dall, Alan Hayward, Torvald Riegel, linux-arm-kernel On Wed, Nov 30, 2016 at 01:38:28PM +0100, Florian Weimer wrote: > On 11/30/2016 01:06 PM, Dave Martin wrote: > > >I'm concerned here that there may be no sensible fixed size for the > >signal frame. We would make it ridiculously large in order to minimise > >the chance of hitting this problem again -- but then it would be > >ridiculously large, which is a potential problem for massively threaded > >workloads. > > What's ridiculously large? The SVE architecture permits VLs up to 2048 bits per vector initially -- but it makes space for future architecture revisions to expand up to 65536 bits per vector, which would result in a signal frame > 270 KB. It's far from certain we'll ever see such large vectors, but it's hard to know where to draw the line. > We could add a system call to get the right stack size. But as it depends > on VL, I'm not sure what it looks like. Particularly if you need determine > the stack size before creating a thread that uses a specific VL setting. I think that the most likely time to set the VL is libc startup or ld.so startup -- so really a process considers the VL fixed, and a hypothetical getsigstksz() function would return a constant value depending on the VL that was set. I'd expect that only specialised code such as libc/ld.so itself or fancy runtimes would need to cope with the need to synchronise stack allocation with VL setting. The initial stack after exec is determined by RLIMIT_STACK -- we can expect that to be easily large enough for the initial thread, under any remotely normal scenario. > >For setcontext/setjmp, we don't save/restore any SVE state due to the > >caller-save status of SVE, and I would not consider it necessary to > >save/restore VL itself because of the no-change-on-the-fly policy for > >this. > > Okay, so we'd potentially set it on thread creation only? That might not be > too bad. Basically, yes. A runtime _could_ set it at other times, and my view is that the kernel shouldn't arbitrarily forbid this -- but it's up to userspace to determine when it's safe to do it, ensure that there's no VL-dependent data live in memory, and to arrange to reallocate stacks or pre-arrange that allocations were already big enough etc. > I really want to avoid a repeat of the setxid fiasco, where we need to run > code on all threads to get something that approximates the POSIX-mandated > behavior (process attribute) from what the kernel provides (thread/task > attribute). Yeah, that would suck. However, for the proposed ABI there is no illusion to preserve here, since the VL is proposed as a per-thread property everywhere, and this is outside the scope of POSIX. If we do have distinct "set process VL" and "set thread VL" interfaces, then my view is that the former should fail if there are already multiple threads, rather than just setting the VL of a single thread or (worse) asynchronously changing the VL of threads other than the caller... > >I'm not familiar with resumable functions/executors -- are these in > >the C++ standards yet (not that that would cause me to be familiar > >with them... ;) Any implementation of coroutines (i.e., > >cooperative switching) is likely to fall under the "setcontext" > >argument above. > > There are different ways to implement coroutines. Stack switching (like > setcontext) is obviously impacted by non-uniform register sizes. But even > the most conservative variant, rather similar to switch-based emulation you > sometimes see in C coroutine implementations, might have trouble restoring > the state if it just cannot restore the saved state due to register size > reductions. Which is not a problem if the variably-sized state is not part of the switched context? Because the SVE procedure call standard determines that the SVE registers are caller-save, they are not live at any external function boundary -- so in cooperative switching it is useless to save/restore this state unless the coroutine framework is defined to have a special procedure call standard. Similarly, my view is that we don't attempt to magically save and restore VL itself either. Code that changes VL after startup would be expected to be aware of and deal with the consequences itself. Cheers ---Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-11-30 13:56 ` Dave Martin @ 2016-12-01 9:21 ` Florian Weimer 2016-12-01 10:30 ` Dave Martin 0 siblings, 1 reply; 30+ messages in thread From: Florian Weimer @ 2016-12-01 9:21 UTC (permalink / raw) To: Dave Martin Cc: Yao Qi, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Christoffer Dall, Alan Hayward, Torvald Riegel, linux-arm-kernel On 11/30/2016 02:56 PM, Dave Martin wrote: > If we do have distinct "set process VL" and "set thread VL" interfaces, > then my view is that the former should fail if there are already > multiple threads, rather than just setting the VL of a single thread or > (worse) asynchronously changing the VL of threads other than the > caller... Yes, looks feasible to me. >>> I'm not familiar with resumable functions/executors -- are these in >>> the C++ standards yet (not that that would cause me to be familiar >>> with them... ;) Any implementation of coroutines (i.e., >>> cooperative switching) is likely to fall under the "setcontext" >>> argument above. >> >> There are different ways to implement coroutines. Stack switching (like >> setcontext) is obviously impacted by non-uniform register sizes. But even >> the most conservative variant, rather similar to switch-based emulation you >> sometimes see in C coroutine implementations, might have trouble restoring >> the state if it just cannot restore the saved state due to register size >> reductions. > > Which is not a problem if the variably-sized state is not part of the > switched context? The VL value is implicitly thread-local data, and the encoded state may have an implicit dependency on it, although it does not contain vector registers as such. > Because the SVE procedure call standard determines that the SVE > registers are caller-save, By the way, how is this implemented? Some of them overlap existing callee-saved registers. > they are not live at any external function > boundary -- so in cooperative switching it is useless to save/restore > this state unless the coroutine framework is defined to have a special > procedure call standard. It can use the standard calling convention, but it may have selected a particular implementation based on the VL value before suspension. Florian ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-12-01 9:21 ` Florian Weimer @ 2016-12-01 10:30 ` Dave Martin 2016-12-01 12:19 ` Dave Martin 2016-12-05 10:44 ` Florian Weimer 0 siblings, 2 replies; 30+ messages in thread From: Dave Martin @ 2016-12-01 10:30 UTC (permalink / raw) To: Florian Weimer Cc: libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Yao Qi, linux-arm-kernel, Alan Hayward, Torvald Riegel, Christoffer Dall On Thu, Dec 01, 2016 at 10:21:03AM +0100, Florian Weimer wrote: > On 11/30/2016 02:56 PM, Dave Martin wrote: > > >If we do have distinct "set process VL" and "set thread VL" interfaces, > >then my view is that the former should fail if there are already > >multiple threads, rather than just setting the VL of a single thread or > >(worse) asynchronously changing the VL of threads other than the > >caller... > > Yes, looks feasible to me. OK, I'll try to hack up something along these lines. > >>>I'm not familiar with resumable functions/executors -- are these in > >>>the C++ standards yet (not that that would cause me to be familiar > >>>with them... ;) Any implementation of coroutines (i.e., > >>>cooperative switching) is likely to fall under the "setcontext" > >>>argument above. > >> > >>There are different ways to implement coroutines. Stack switching (like > >>setcontext) is obviously impacted by non-uniform register sizes. But even > >>the most conservative variant, rather similar to switch-based emulation you > >>sometimes see in C coroutine implementations, might have trouble restoring > >>the state if it just cannot restore the saved state due to register size > >>reductions. > > > >Which is not a problem if the variably-sized state is not part of the > >switched context? > > The VL value is implicitly thread-local data, and the encoded state may have > an implicit dependency on it, although it does not contain vector registers > as such. This doesn't sound like an absolute requirement to me. If we presume that the SVE registers never need to get saved or restored, what stops the context data format being VL-independent? The setcontext()/getcontext() implementation for example will not change at all for SVE. > >Because the SVE procedure call standard determines that the SVE > >registers are caller-save, > > By the way, how is this implemented? Some of them overlap existing > callee-saved registers. Basically, all the *new* state is caller-save. The Neon/FPSIMD regs V8-V15 are callee-save, so in the SVE view Zn[bits 127:0] is callee-save for all n = 8..15. > >they are not live at any external function > >boundary -- so in cooperative switching it is useless to save/restore > >this state unless the coroutine framework is defined to have a special > >procedure call standard. > > It can use the standard calling convention, but it may have selected a > particular implementation based on the VL value before suspension. If the save/restore logic doesn't touch SVE, which would its implementation be VL-dependent? Cheers ---Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-12-01 10:30 ` Dave Martin @ 2016-12-01 12:19 ` Dave Martin 2016-12-05 10:44 ` Florian Weimer 1 sibling, 0 replies; 30+ messages in thread From: Dave Martin @ 2016-12-01 12:19 UTC (permalink / raw) To: Florian Weimer Cc: libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Yao Qi, Christoffer Dall, Alan Hayward, Torvald Riegel, linux-arm-kernel On Thu, Dec 01, 2016 at 10:30:51AM +0000, Dave Martin wrote: [...] > Basically, all the *new* state is caller-save. > > The Neon/FPSIMD regs V8-V15 are callee-save, so in the SVE view > Zn[bits 127:0] is callee-save for all n = 8..15. Ramana is right -- the current procedure call standard (ARM IHI 0055B) only requires the bottom _64_ bits of V8-V15 to be preserved (not all 128 bits as I stated). [...] Cheers ---Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-12-01 10:30 ` Dave Martin 2016-12-01 12:19 ` Dave Martin @ 2016-12-05 10:44 ` Florian Weimer 2016-12-05 11:07 ` Szabolcs Nagy 2016-12-05 15:05 ` Dave Martin 1 sibling, 2 replies; 30+ messages in thread From: Florian Weimer @ 2016-12-05 10:44 UTC (permalink / raw) To: Dave Martin Cc: libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Yao Qi, linux-arm-kernel, Alan Hayward, Torvald Riegel, Christoffer Dall On 12/01/2016 11:30 AM, Dave Martin wrote: >> The VL value is implicitly thread-local data, and the encoded state may have >> an implicit dependency on it, although it does not contain vector registers >> as such. > > This doesn't sound like an absolute requirement to me. > > If we presume that the SVE registers never need to get saved or > restored, what stops the context data format being VL-independent? I'm concerned the suspended computation has code which has been selected to fit a particular VL value. > If the save/restore logic doesn't touch SVE, which would its > implementation be VL-dependent? Because it has been optimized for a certain vector length? >>> Because the SVE procedure call standard determines that the SVE >>> registers are caller-save, >> >> By the way, how is this implemented? Some of them overlap existing >> callee-saved registers. > > Basically, all the *new* state is caller-save. > > The Neon/FPSIMD regs V8-V15 are callee-save, so in the SVE view > Zn[bits 127:0] is callee-save for all n = 8..15. Are the extension parts of registers v8 to v15 used for argument passing? If not, we should be able to use the existing dynamic linker trampoline. Thanks, Florian ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-12-05 10:44 ` Florian Weimer @ 2016-12-05 11:07 ` Szabolcs Nagy 2016-12-05 15:05 ` Dave Martin 1 sibling, 0 replies; 30+ messages in thread From: Szabolcs Nagy @ 2016-12-05 11:07 UTC (permalink / raw) To: Florian Weimer, Dave Martin Cc: nd, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Yao Qi, linux-arm-kernel, Alan Hayward, Torvald Riegel, Christoffer Dall On 05/12/16 10:44, Florian Weimer wrote: >>> By the way, how is this implemented? Some of them overlap existing >>> callee-saved registers. >> >> Basically, all the *new* state is caller-save. >> >> The Neon/FPSIMD regs V8-V15 are callee-save, so in the SVE view >> Zn[bits 127:0] is callee-save for all n = 8..15. > > Are the extension parts of registers v8 to v15 used for argument passing? > > If not, we should be able to use the existing dynamic linker trampoline. > if sve arguments are passed to a function then it has special call abi (which is probably not yet documented), this call abi requires that such a call does not go through plt to avoid requiring sve aware libc. same for tls access: the top part of sve regs have to be saved by the caller before accessing tls so the tlsdesc entry does not have to save them. so current trampolines should be fine. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-12-05 10:44 ` Florian Weimer 2016-12-05 11:07 ` Szabolcs Nagy @ 2016-12-05 15:05 ` Dave Martin 1 sibling, 0 replies; 30+ messages in thread From: Dave Martin @ 2016-12-05 15:05 UTC (permalink / raw) To: Florian Weimer Cc: libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Yao Qi, Christoffer Dall, Alan Hayward, Torvald Riegel, linux-arm-kernel On Mon, Dec 05, 2016 at 11:44:38AM +0100, Florian Weimer wrote: > On 12/01/2016 11:30 AM, Dave Martin wrote: > > >>The VL value is implicitly thread-local data, and the encoded state may have > >>an implicit dependency on it, although it does not contain vector registers > >>as such. > > > >This doesn't sound like an absolute requirement to me. > > > >If we presume that the SVE registers never need to get saved or > >restored, what stops the context data format being VL-independent? > > I'm concerned the suspended computation has code which has been selected to > fit a particular VL value. > > > If the save/restore logic doesn't touch SVE, which would its > > implementation be VL-dependent? > > Because it has been optimized for a certain vector length? I'll respond to these via Szabolcs' reply. > >>>Because the SVE procedure call standard determines that the SVE > >>>registers are caller-save, > >> > >>By the way, how is this implemented? Some of them overlap existing > >>callee-saved registers. > > > >Basically, all the *new* state is caller-save. > > > >The Neon/FPSIMD regs V8-V15 are callee-save, so in the SVE view > >Zn[bits 127:0] is callee-save for all n = 8..15. > > Are the extension parts of registers v8 to v15 used for argument passing? No -- the idea is to be directly compatible with the existing PCS. > If not, we should be able to use the existing dynamic linker trampoline. Yes, I believe so. Cheers ---Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-11-30 12:38 ` Florian Weimer 2016-11-30 13:56 ` Dave Martin @ 2016-12-02 11:49 ` Dave Martin 2016-12-02 16:34 ` Florian Weimer 1 sibling, 1 reply; 30+ messages in thread From: Dave Martin @ 2016-12-02 11:49 UTC (permalink / raw) To: Florian Weimer Cc: Yao Qi, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Christoffer Dall, Alan Hayward, Torvald Riegel, linux-arm-kernel On Wed, Nov 30, 2016 at 01:38:28PM +0100, Florian Weimer wrote: [...] > We could add a system call to get the right stack size. But as it depends > on VL, I'm not sure what it looks like. Particularly if you need determine > the stack size before creating a thread that uses a specific VL setting. I missed this point previously -- apologies for that. What would you think of: set_vl(vl_for_new_thread); minsigstksz = get_minsigstksz(); set_vl(my_vl); This avoids get_minsigstksz() requiring parameters -- which is mainly a concern because the parameters tomorrow might be different from the parameters today. If it is possible to create the new thread without any SVE-dependent code, then we could set_vl(vl_for_new_thread); new_thread_stack = malloc(get_minsigstksz()); new_thread = create_thread(..., new_thread_stack); set_vl(my_vl); which has the nice property that the new thread directly inherits the configuration that was used for get_minsigstksz(). However, it would be necessary to prevent GCC from moving any code across these statements -- in particular, SVE code that access VL- dependent data spilled on the stack is liable to go wrong if reordered with the above. So the sequence would need to go in an external function (or a single asm...) Failing that, we could maybe define some extensible struct to get_minsigstksz(). Thoughts? Cheers ---Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-12-02 11:49 ` Dave Martin @ 2016-12-02 16:34 ` Florian Weimer 2016-12-02 16:59 ` Joseph Myers 0 siblings, 1 reply; 30+ messages in thread From: Florian Weimer @ 2016-12-02 16:34 UTC (permalink / raw) To: Dave Martin Cc: Yao Qi, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Christoffer Dall, Alan Hayward, Torvald Riegel, linux-arm-kernel On 12/02/2016 12:48 PM, Dave Martin wrote: > On Wed, Nov 30, 2016 at 01:38:28PM +0100, Florian Weimer wrote: > > [...] > >> We could add a system call to get the right stack size. But as it depends >> on VL, I'm not sure what it looks like. Particularly if you need determine >> the stack size before creating a thread that uses a specific VL setting. > > I missed this point previously -- apologies for that. > > What would you think of: > > set_vl(vl_for_new_thread); > minsigstksz = get_minsigstksz(); > set_vl(my_vl); > > This avoids get_minsigstksz() requiring parameters -- which is mainly a > concern because the parameters tomorrow might be different from the > parameters today. > > If it is possible to create the new thread without any SVE-dependent code, > then we could > > set_vl(vl_for_new_thread); > new_thread_stack = malloc(get_minsigstksz()); > new_thread = create_thread(..., new_thread_stack); > set_vl(my_vl); > > which has the nice property that the new thread directly inherits the > configuration that was used for get_minsigstksz(). Because all SVE registers are caller-saved, it's acceptable to temporarily reduce the VL value, I think. So this should work. One complication is that both the kernel and the libc need to reserve stack space, so the kernel-returned value and the one which has to be used in reality will be different. > However, it would be necessary to prevent GCC from moving any code > across these statements -- in particular, SVE code that access VL- > dependent data spilled on the stack is liable to go wrong if reordered > with the above. So the sequence would need to go in an external > function (or a single asm...) I would talk to GCC folks—we have similar issues with changing the FPU rounding mode, I assume. Thanks, Florian ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-12-02 16:34 ` Florian Weimer @ 2016-12-02 16:59 ` Joseph Myers 2016-12-02 18:21 ` Dave Martin 0 siblings, 1 reply; 30+ messages in thread From: Joseph Myers @ 2016-12-02 16:59 UTC (permalink / raw) To: Florian Weimer Cc: Dave Martin, Yao Qi, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Christoffer Dall, Alan Hayward, Torvald Riegel, linux-arm-kernel [-- Attachment #1: Type: text/plain, Size: 1482 bytes --] On Fri, 2 Dec 2016, Florian Weimer wrote: > > However, it would be necessary to prevent GCC from moving any code > > across these statements -- in particular, SVE code that access VL- > > dependent data spilled on the stack is liable to go wrong if reordered > > with the above. So the sequence would need to go in an external > > function (or a single asm...) > > I would talk to GCC folks—we have similar issues with changing the FPU > rounding mode, I assume. In general, GCC doesn't track the implicit uses of thread-local state involved in floating-point exceptions and rounding modes, and so doesn't avoid moving code across manipulations of such state; there are various open bugs in this area (though many of the open bugs are for local rather than global issues with code generation or local optimizations not respecting exceptions and rounding modes, which are easier to fix). Hence glibc using various macros such as math_opt_barrier and math_force_eval which use asms to prevent such motion. I'm not familiar enough with the optimizers to judge the right way to address such issues with implicit use of thread-local state. And I haven't thought much yet about how to implement TS 18661-1 constant rounding modes, which would involve the compiler implicitly inserting rounding modes changes, though I think it would be fairly straightforward given underlying support for avoiding inappropriate code motion. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-12-02 16:59 ` Joseph Myers @ 2016-12-02 18:21 ` Dave Martin 2016-12-02 21:57 ` Joseph Myers 0 siblings, 1 reply; 30+ messages in thread From: Dave Martin @ 2016-12-02 18:21 UTC (permalink / raw) To: Joseph Myers Cc: Florian Weimer, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Yao Qi, linux-arm-kernel, Alan Hayward, Torvald Riegel, Christoffer Dall On Fri, Dec 02, 2016 at 04:59:27PM +0000, Joseph Myers wrote: > On Fri, 2 Dec 2016, Florian Weimer wrote: > > > > However, it would be necessary to prevent GCC from moving any code > > > across these statements -- in particular, SVE code that access VL- > > > dependent data spilled on the stack is liable to go wrong if reordered > > > with the above. So the sequence would need to go in an external > > > function (or a single asm...) > > > > I would talk to GCC folksâwe have similar issues with changing the FPU > > rounding mode, I assume. > > In general, GCC doesn't track the implicit uses of thread-local state > involved in floating-point exceptions and rounding modes, and so doesn't > avoid moving code across manipulations of such state; there are various > open bugs in this area (though many of the open bugs are for local rather > than global issues with code generation or local optimizations not > respecting exceptions and rounding modes, which are easier to fix). Hence > glibc using various macros such as math_opt_barrier and math_force_eval > which use asms to prevent such motion. Presumably the C language specs specify that fenv manipulations cannot be reordered with respect to evaluation or floating-point expressions? Sanity would seem to require this, though I've not dug into the specs myself yet. This doesn't get us off the hook for prctl() -- the C specs can only define constraints on reordering for things that appear in the C spec. prctl() is just an external function call in this context, and doesn't enjoy the same guarantees. > I'm not familiar enough with the optimizers to judge the right way to > address such issues with implicit use of thread-local state. And I > haven't thought much yet about how to implement TS 18661-1 constant > rounding modes, which would involve the compiler implicitly inserting > rounding modes changes, though I think it would be fairly straightforward > given underlying support for avoiding inappropriate code motion. My concern is that the compiler has no clue about what code motions are appropriate or not with respect to a system call, beyond what applies to a system call in general (i.e., asm volatile ( ::: "memory" ) for GCC). ? Cheers ---Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-12-02 18:21 ` Dave Martin @ 2016-12-02 21:57 ` Joseph Myers 0 siblings, 0 replies; 30+ messages in thread From: Joseph Myers @ 2016-12-02 21:57 UTC (permalink / raw) To: Dave Martin Cc: Florian Weimer, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Yao Qi, linux-arm-kernel, Alan Hayward, Torvald Riegel, Christoffer Dall On Fri, 2 Dec 2016, Dave Martin wrote: > Presumably the C language specs specify that fenv manipulations cannot > be reordered with respect to evaluation or floating-point expressions? Yes (in the context of #pragma STDC FENV_ACCESS ON). And you need to presume that an arbitrary function call might manipulate the environment unless you know it doesn't. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-11-30 12:07 ` Dave Martin 2016-11-30 12:22 ` Szabolcs Nagy 2016-11-30 12:38 ` Florian Weimer @ 2016-12-02 21:56 ` Yao Qi 2016-12-05 15:12 ` Dave Martin 2016-12-05 22:42 ` Torvald Riegel 3 siblings, 1 reply; 30+ messages in thread From: Yao Qi @ 2016-12-02 21:56 UTC (permalink / raw) To: Dave Martin Cc: Florian Weimer, linux-arm-kernel, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Alan Hayward, Torvald Riegel, Christoffer Dall On 16-11-30 12:06:54, Dave Martin wrote: > So, my key goal is to support _per-process_ vector length control. > > From the kernel perspective, it is easiest to achieve this by providing > per-thread control since that is the unit that context switching acts > on. > Hi, Dave, Thanks for the explanation. > How useful it really is to have threads with different VLs in the same > process is an open question. It's theoretically useful for runtime > environments, which may want to dispatch code optimised for different > VLs -- changing the VL on-the-fly within a single thread is not > something I want to encourage, due to overhead and ABI issues, but > switching between threads of different VLs would be more manageable. This is a weird programming model. > However, I expect mixing different VLs within a single process to be > very much a special case -- it's not something I'd expect to work with > general-purpose code. > > Since the need for indepent VLs per thread is not proven, we could > > * forbid it -- i.e., only a thread-group leader with no children is > permitted to change the VL, which is then inherited by any child threads > that are subsequently created > > * permit it only if a special flag is specified when requesting the VL > change > > * permit it and rely on userspace to be sensible -- easiest option for > the kernel. Both the first and the third one is reasonable to me, but the first one fit well in existing GDB design. I don't know how useful it is to have per-thread VL, there may be some workloads can be implemented that way. GDB needs some changes to support "per-thread" target description. -- Yao ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-12-02 21:56 ` Yao Qi @ 2016-12-05 15:12 ` Dave Martin 0 siblings, 0 replies; 30+ messages in thread From: Dave Martin @ 2016-12-05 15:12 UTC (permalink / raw) To: Yao Qi Cc: Florian Weimer, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Christoffer Dall, Alan Hayward, Torvald Riegel, linux-arm-kernel On Fri, Dec 02, 2016 at 09:56:46PM +0000, Yao Qi wrote: > On 16-11-30 12:06:54, Dave Martin wrote: > > So, my key goal is to support _per-process_ vector length control. > > > > From the kernel perspective, it is easiest to achieve this by providing > > per-thread control since that is the unit that context switching acts > > on. > > > > Hi, Dave, > Thanks for the explanation. > > > How useful it really is to have threads with different VLs in the same > > process is an open question. It's theoretically useful for runtime > > environments, which may want to dispatch code optimised for different > > VLs -- changing the VL on-the-fly within a single thread is not > > something I want to encourage, due to overhead and ABI issues, but > > switching between threads of different VLs would be more manageable. > > This is a weird programming model. I may not have explained that very well. What I meant is, you have two threads communicating with one another, say. Providing that they don't exchange data using a VL-dependent representation, it should not matter that the two threads are running with different VLs. This may make sense if a particular piece of work was optimised for a particular VL: you can pick a worker thread with the correct VL and dispatch the job there for best performance. I wouldn't expect this ability to be exploited except by specialised frameworks. > > However, I expect mixing different VLs within a single process to be > > very much a special case -- it's not something I'd expect to work with > > general-purpose code. > > > > Since the need for indepent VLs per thread is not proven, we could > > > > * forbid it -- i.e., only a thread-group leader with no children is > > permitted to change the VL, which is then inherited by any child threads > > that are subsequently created > > > > * permit it only if a special flag is specified when requesting the VL > > change > > > > * permit it and rely on userspace to be sensible -- easiest option for > > the kernel. > > Both the first and the third one is reasonable to me, but the first one > fit well in existing GDB design. I don't know how useful it is to have > per-thread VL, there may be some workloads can be implemented that way. > GDB needs some changes to support "per-thread" target description. OK -- I'll implement for per-thread for now, but this can be clarified later. Cheers ---Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-11-30 12:07 ` Dave Martin ` (2 preceding siblings ...) 2016-12-02 21:56 ` Yao Qi @ 2016-12-05 22:42 ` Torvald Riegel 2016-12-06 14:46 ` Dave Martin 3 siblings, 1 reply; 30+ messages in thread From: Torvald Riegel @ 2016-12-05 22:42 UTC (permalink / raw) To: Dave Martin Cc: Florian Weimer, Yao Qi, linux-arm-kernel, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Alan Hayward, Christoffer Dall On Wed, 2016-11-30 at 12:06 +0000, Dave Martin wrote: > So, my key goal is to support _per-process_ vector length control. > > From the kernel perspective, it is easiest to achieve this by providing > per-thread control since that is the unit that context switching acts > on. > > How useful it really is to have threads with different VLs in the same > process is an open question. It's theoretically useful for runtime > environments, which may want to dispatch code optimised for different > VLs What would be the primary use case(s)? Vectorization of short vectors (eg, if having an array of structs or sth like that)? > -- changing the VL on-the-fly within a single thread is not > something I want to encourage, due to overhead and ABI issues, but > switching between threads of different VLs would be more manageable. So if on-the-fly switching is probably not useful, that would mean we need special threads for the use cases. Is that a realistic assumption for the use cases? Or do you primarily want to keep it possible to do this, regardless of whether there are real use cases now? I suppose allowing for a per-thread setting of VL could also be added as a feature in the future without breaking existing code. > For setcontext/setjmp, we don't save/restore any SVE state due to the > caller-save status of SVE, and I would not consider it necessary to > save/restore VL itself because of the no-change-on-the-fly policy for > this. Thus, you would basically consider VL changes or per-thread VL as in the realm of compilation internals? So, the specific size for a particular piece of code would not be part of an ABI? > I'm not familiar with resumable functions/executors -- are these in > the C++ standards yet (not that that would cause me to be familiar > with them... ;) Any implementation of coroutines (i.e., > cooperative switching) is likely to fall under the "setcontext" > argument above. These are not part of the C++ standard yet, but will appear in TSes. There are various features for which implementations would be assumed to use one OS thread for several tasks, coroutines, etc. Some of them switch between these tasks or coroutines while these are running, whereas the ones that will be in C++17 only run more than parallel task on the same OS thread but one after the other (like in a thread pool). However, if we are careful not to expose VL or make promises about it, this may just end up being a detail similar to, say, register allocation, which isn't exposed beyond the internals of a particular compiler either. Exposing it as a feature the user can set without messing with the implementation would introduce additional thread-specific state, as Florian said. This might not be a show-stopper by itself, but the more thread-specific state we have the more an implementation has to take care of or switch, and the higher the runtime costs are. C++17 already makes weaker promises for TLS for parallel tasks, so that implementations don't have to run TLS constructors or destructors just because a small parallel task was executed. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-12-05 22:42 ` Torvald Riegel @ 2016-12-06 14:46 ` Dave Martin 0 siblings, 0 replies; 30+ messages in thread From: Dave Martin @ 2016-12-06 14:46 UTC (permalink / raw) To: Torvald Riegel Cc: Florian Weimer, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Yao Qi, Christoffer Dall, Alan Hayward, linux-arm-kernel On Mon, Dec 05, 2016 at 11:42:19PM +0100, Torvald Riegel wrote: Hi there, > On Wed, 2016-11-30 at 12:06 +0000, Dave Martin wrote: > > So, my key goal is to support _per-process_ vector length control. > > > > From the kernel perspective, it is easiest to achieve this by providing > > per-thread control since that is the unit that context switching acts > > on. > > > > How useful it really is to have threads with different VLs in the same > > process is an open question. It's theoretically useful for runtime > > environments, which may want to dispatch code optimised for different > > VLs > > What would be the primary use case(s)? Vectorization of short vectors > (eg, if having an array of structs or sth like that)? I'm not sure exactly what you're asking here. SVE supports a regular SIMD-type computational model, along with scalable vectors and features for speculative vectorisation of loops whose iteration count is not statically known (or, possibly not known even at loop entry at runtime). It's intended as a compiler target, so any algorithm that involves iterative computation may get some benefit -- though the amount of benefit, and how the benefit scales with vector length, will depend on the algorithm in question. So some algorithms may get more benefit more from large VLs than others. For jobs where performance tends to saturate at a shorter VL, it may make sense to get the compiler to compile for the shorter VL -- this may enable the same binary code to perform more optimally on a wider range of hardware, but that may also mean you want to run that job with the VL it was compiled for instead of what the hardware supports. In high-assurance scenarios, you might also want to restrict a particular job to run at the VL that you validated for. > > -- changing the VL on-the-fly within a single thread is not > > something I want to encourage, due to overhead and ABI issues, but > > switching between threads of different VLs would be more manageable. > > So if on-the-fly switching is probably not useful, that would mean we > need special threads for the use cases. Is that a realistic assumption > for the use cases? Or do you primarily want to keep it possible to do > this, regardless of whether there are real use cases now? > I suppose allowing for a per-thread setting of VL could also be added as > a feature in the future without breaking existing code. Per-thread VL use cases are hypothetical for now. It's easy to support per-thread VLs in the kernel, but we could deny it initially and wait for someone to come along with a concrete use case. > > For setcontext/setjmp, we don't save/restore any SVE state due to the > > caller-save status of SVE, and I would not consider it necessary to > > save/restore VL itself because of the no-change-on-the-fly policy for > > this. > > Thus, you would basically consider VL changes or per-thread VL as in the > realm of compilation internals? So, the specific size for a particular > piece of code would not be part of an ABI? Basically yes. For most people, this would be hidden in libc/ld.so/some framework. This goes for most prctl()s -- random user code shouldn't normally touch them unless it knows what it's doing. > > I'm not familiar with resumable functions/executors -- are these in > > the C++ standards yet (not that that would cause me to be familiar > > with them... ;) Any implementation of coroutines (i.e., > > cooperative switching) is likely to fall under the "setcontext" > > argument above. > > These are not part of the C++ standard yet, but will appear in TSes. > There are various features for which implementations would be assumed to > use one OS thread for several tasks, coroutines, etc. Some of them > switch between these tasks or coroutines while these are running, Is the switching ever preemptive? If not, that these features are unlikely to be a concern for SVE. It's preemptive switching that would require the saving of extra SVE state (which is why we need to care for signals). > whereas the ones that will be in C++17 only run more than parallel task > on the same OS thread but one after the other (like in a thread pool). If jobs are only run to completion before yielding, that again isn't a concern for SVE. > However, if we are careful not to expose VL or make promises about it, > this may just end up being a detail similar to, say, register > allocation, which isn't exposed beyond the internals of a particular > compiler either. > Exposing it as a feature the user can set without messing with the > implementation would introduce additional thread-specific state, as > Florian said. This might not be a show-stopper by itself, but the more > thread-specific state we have the more an implementation has to take > care of or switch, and the higher the runtime costs are. C++17 already > makes weaker promises for TLS for parallel tasks, so that > implementations don't have to run TLS constructors or destructors just > because a small parallel task was executed. There's a difference between a feature that exposed by the kernel, and a feature endorsed by the language / runtime. For example, random code can enable seccomp via prctl(PR_SET_SECCOMP) -- this may make most of libc unsafe to use, because under strict seccomp most syscalls simply kill the thread. libc doesn't pretend to support this out of the box, but this feature is also not needlessly denied to user code that knows what it's doing. I tend to put setting the VL into this category: it is safe, and useful or even necessary to change the VL in some situations, but userspace is responsible for managing this for itself. The kernel doesn't have enough information to make these decisions unilaterally. Cheers ---Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-11-25 19:39 [RFC PATCH 00/29] arm64: Scalable Vector Extension core support Dave Martin ` (5 preceding siblings ...) 2016-11-30 9:56 ` [RFC PATCH 00/29] arm64: Scalable Vector Extension core support Yao Qi @ 2016-11-30 10:08 ` Florian Weimer 2016-11-30 11:06 ` Szabolcs Nagy 6 siblings, 1 reply; 30+ messages in thread From: Florian Weimer @ 2016-11-30 10:08 UTC (permalink / raw) To: Dave Martin, linux-arm-kernel Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Alan Hayward, libc-alpha, gdb, Torvald Riegel On 11/25/2016 08:38 PM, Dave Martin wrote: > The Scalable Vector Extension (SVE) [1] is an extension to AArch64 which > adds extra SIMD functionality and supports much larger vectors. > > This series implements core Linux support for SVE. > > Recipents not copied on the whole series can find the rest of the > patches in the linux-arm-kernel archives [2]. > > > The first 5 patches "arm64: signal: ..." factor out the allocation and > placement of state information in the signal frame. The first three > are prerequisites for the SVE support patches. > > Patches 04-05 implement expansion of the signal frame, and may remain > controversial due to ABI break issues: > > * Discussion is needed on how userspace should detect/negotiate signal > frame size in order for this expansion mechanism to be workable. I'm leaning towards a simple increase in the glibc headers (despite the ABI risk), plus a personality flag to disable really wide vector registers in case this causes problems with old binaries. A more elaborate mechanism will likely introduce more bugs than it makes existing applications working, due to its complexity. > The remaining patches implement initial SVE support for Linux, with the > following limitations: > > * No KVM/virtualisation support for guests. > > * No independent SVE vector length configuration per thread. This is > planned, but will follow as a separate add-on series. Per-thread register widths will likely make coroutine switching (setcontext) and C++ resumable functions/executors quite challenging. Can you detail your plans in this area? Thanks, Florian ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-11-30 10:08 ` Florian Weimer @ 2016-11-30 11:06 ` Szabolcs Nagy 2016-11-30 14:06 ` Dave Martin 0 siblings, 1 reply; 30+ messages in thread From: Szabolcs Nagy @ 2016-11-30 11:06 UTC (permalink / raw) To: Florian Weimer, Dave Martin, linux-arm-kernel Cc: nd, Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Alan Hayward, libc-alpha, gdb, Torvald Riegel On 30/11/16 10:08, Florian Weimer wrote: > On 11/25/2016 08:38 PM, Dave Martin wrote: >> The Scalable Vector Extension (SVE) [1] is an extension to AArch64 which >> adds extra SIMD functionality and supports much larger vectors. >> >> This series implements core Linux support for SVE. >> >> Recipents not copied on the whole series can find the rest of the >> patches in the linux-arm-kernel archives [2]. >> >> >> The first 5 patches "arm64: signal: ..." factor out the allocation and >> placement of state information in the signal frame. The first three >> are prerequisites for the SVE support patches. >> >> Patches 04-05 implement expansion of the signal frame, and may remain >> controversial due to ABI break issues: >> >> * Discussion is needed on how userspace should detect/negotiate signal >> frame size in order for this expansion mechanism to be workable. > > I'm leaning towards a simple increase in the glibc headers (despite the ABI risk), plus a personality flag to > disable really wide vector registers in case this causes problems with old binaries. > if the kernel does not increase the size and libc does not add size checks then old binaries would work with new libc just fine.. but that's non-conforming, posix requires the check. if the kernel increases the size then it has to be changed in bionic and musl as well and old binaries may break. > A more elaborate mechanism will likely introduce more bugs than it makes existing applications working, due to > its complexity. > >> The remaining patches implement initial SVE support for Linux, with the >> following limitations: >> >> * No KVM/virtualisation support for guests. >> >> * No independent SVE vector length configuration per thread. This is >> planned, but will follow as a separate add-on series. > > Per-thread register widths will likely make coroutine switching (setcontext) and C++ resumable > functions/executors quite challenging. > i'd assume it's undefined to context switch to a different thread or to resume a function on a different thread (because the implementation can cache thread local state on the stack: e.g. errno pointer).. of course this does not stop ppl from doing it, but the practice is questionable. > Can you detail your plans in this area? > > Thanks, > Florian ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/29] arm64: Scalable Vector Extension core support 2016-11-30 11:06 ` Szabolcs Nagy @ 2016-11-30 14:06 ` Dave Martin 0 siblings, 0 replies; 30+ messages in thread From: Dave Martin @ 2016-11-30 14:06 UTC (permalink / raw) To: Szabolcs Nagy Cc: Florian Weimer, linux-arm-kernel, Torvald Riegel, libc-alpha, Ard Biesheuvel, Marc Zyngier, gdb, Alan Hayward, nd, Christoffer Dall On Wed, Nov 30, 2016 at 11:05:41AM +0000, Szabolcs Nagy wrote: > On 30/11/16 10:08, Florian Weimer wrote: > > On 11/25/2016 08:38 PM, Dave Martin wrote: [...] > >> * Discussion is needed on how userspace should detect/negotiate signal > >> frame size in order for this expansion mechanism to be workable. > > > > I'm leaning towards a simple increase in the glibc headers (despite the ABI risk), plus a personality flag to > > disable really wide vector registers in case this causes problems with old binaries. > > > > if the kernel does not increase the size and libc > does not add size checks then old binaries would > work with new libc just fine.. > but that's non-conforming, posix requires the check. > > if the kernel increases the size then it has to be > changed in bionic and musl as well and old binaries > may break. Or we need a personality flag or similar to distinguish the two cases. [...] > > A more elaborate mechanism will likely introduce more bugs than it makes existing applications working, due to > > its complexity. > > > >> The remaining patches implement initial SVE support for Linux, with the > >> following limitations: > >> > >> * No KVM/virtualisation support for guests. > >> > >> * No independent SVE vector length configuration per thread. This is > >> planned, but will follow as a separate add-on series. > > > > Per-thread register widths will likely make coroutine switching (setcontext) and C++ resumable > > functions/executors quite challenging. > > > > i'd assume it's undefined to context switch to a different > thread or to resume a function on a different thread > (because the implementation can cache thread local state > on the stack: e.g. errno pointer).. of course this does > not stop ppl from doing it, but the practice is questionable. I don't have a view on this. Cheers ---Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2016-12-06 14:46 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-11-25 19:39 [RFC PATCH 00/29] arm64: Scalable Vector Extension core support Dave Martin 2016-11-25 19:41 ` [RFC PATCH 16/29] arm64/sve: signal: Add SVE state record to sigcontext Dave Martin 2016-11-25 19:41 ` [RFC PATCH 24/29] arm64/sve: Discard SVE state on system call Dave Martin 2016-11-25 19:41 ` [RFC PATCH 18/29] arm64/sve: signal: Restore FPSIMD/SVE state in rt_sigreturn Dave Martin 2016-11-25 19:41 ` [RFC PATCH 17/29] arm64/sve: signal: Dump Scalable Vector Extension registers to user stack Dave Martin 2016-11-25 19:42 ` [RFC PATCH 27/29] arm64/sve: ptrace support Dave Martin 2016-11-30 9:56 ` [RFC PATCH 00/29] arm64: Scalable Vector Extension core support Yao Qi 2016-11-30 12:07 ` Dave Martin 2016-11-30 12:22 ` Szabolcs Nagy 2016-11-30 14:10 ` Dave Martin 2016-11-30 12:38 ` Florian Weimer 2016-11-30 13:56 ` Dave Martin 2016-12-01 9:21 ` Florian Weimer 2016-12-01 10:30 ` Dave Martin 2016-12-01 12:19 ` Dave Martin 2016-12-05 10:44 ` Florian Weimer 2016-12-05 11:07 ` Szabolcs Nagy 2016-12-05 15:05 ` Dave Martin 2016-12-02 11:49 ` Dave Martin 2016-12-02 16:34 ` Florian Weimer 2016-12-02 16:59 ` Joseph Myers 2016-12-02 18:21 ` Dave Martin 2016-12-02 21:57 ` Joseph Myers 2016-12-02 21:56 ` Yao Qi 2016-12-05 15:12 ` Dave Martin 2016-12-05 22:42 ` Torvald Riegel 2016-12-06 14:46 ` Dave Martin 2016-11-30 10:08 ` Florian Weimer 2016-11-30 11:06 ` Szabolcs Nagy 2016-11-30 14:06 ` Dave Martin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox