Mirror of the gdb mailing list
 help / color / mirror / Atom feed
* [PATCH 09/27] arm64/sve: Signal frame and context structure definition
       [not found] <1502280338-23002-1-git-send-email-Dave.Martin@arm.com>
@ 2017-08-09 12:06 ` Dave Martin
  2017-08-22 10:22   ` Alex Bennée
  2017-08-09 12:07 ` [PATCH 14/27] arm64/sve: Backend logic for setting the vector length Dave Martin
  2017-08-09 12:08 ` [PATCH 18/27] arm64/sve: ptrace and ELF coredump support Dave Martin
  2 siblings, 1 reply; 10+ messages in thread
From: Dave Martin @ 2017-08-09 12:06 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, Szabolcs Nagy,
	Richard Sandiford, kvmarm, libc-alpha, linux-arch, gdb,
	Alan Hayward, Yao Qi

This patch defines the representation that will be used for the SVE
register state in the signal frame, and implements support for
saving and restoring the SVE registers around signals.

The same layout will also be used for the in-kernel task state.

Due to the variability of the SVE vector length, it is not possible
to define a fixed C struct to describe all the registers.  Instead,
Macros are defined in sigcontext.h to facilitate access to the
parts of the structure.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/uapi/asm/sigcontext.h | 113 ++++++++++++++++++++++++++++++-
 1 file changed, 112 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h
index f0a76b9..0533bdf 100644
--- a/arch/arm64/include/uapi/asm/sigcontext.h
+++ b/arch/arm64/include/uapi/asm/sigcontext.h
@@ -16,6 +16,8 @@
 #ifndef _UAPI__ASM_SIGCONTEXT_H
 #define _UAPI__ASM_SIGCONTEXT_H
 
+#ifndef __ASSEMBLY__
+
 #include <linux/types.h>
 
 /*
@@ -41,10 +43,11 @@ struct sigcontext {
  *
  *	0x210		fpsimd_context
  *	 0x10		esr_context
+ *	0x8a0		sve_context (vl <= 64) (optional)
  *	 0x20		extra_context (optional)
  *	 0x10		terminator (null _aarch64_ctx)
  *
- *	0xdb0		(reserved for future allocation)
+ *	0x510		(reserved for future allocation)
  *
  * New records that can exceed this space need to be opt-in for userspace, so
  * that an expanded signal frame is not generated unexpectedly.  The mechanism
@@ -116,4 +119,112 @@ struct extra_context {
 	__u32 __reserved[3];
 };
 
+#define SVE_MAGIC	0x53564501
+
+struct sve_context {
+	struct _aarch64_ctx head;
+	__u16 vl;
+	__u16 __reserved[3];
+};
+
+#endif /* !__ASSEMBLY__ */
+
+/*
+ * The SVE architecture leaves space for future expansion of the
+ * vector length beyond its initial architectural limit of 2048 bits
+ * (16 quadwords).
+ */
+#define SVE_VQ_MIN		1
+#define SVE_VQ_MAX		0x200
+
+#define SVE_VL_MIN		(SVE_VQ_MIN * 0x10)
+#define SVE_VL_MAX		(SVE_VQ_MAX * 0x10)
+
+#define SVE_NUM_ZREGS		32
+#define SVE_NUM_PREGS		16
+
+#define sve_vl_valid(vl) \
+	((vl) % 0x10 == 0 && (vl) >= SVE_VL_MIN && (vl) <= SVE_VL_MAX)
+#define sve_vq_from_vl(vl)	((vl) / 0x10)
+#define sve_vl_from_vq(vq)	((vq) * 0x10)
+
+/*
+ * If the SVE registers are currently live for the thread at signal delivery,
+ * sve_context.head.size >=
+ *	SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl))
+ * and the register data may be accessed using the SVE_SIG_*() macros.
+ *
+ * If sve_context.head.size <
+ *	SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)),
+ * the SVE registers were not live for the thread and no register data
+ * is included: in this case, the SVE_SIG_*() macros should not be
+ * used except for this check.
+ *
+ * The same convention applies when returning from a signal: a caller
+ * will need to remove or resize the sve_context block if it wants to
+ * make the SVE registers live when they were previously non-live or
+ * vice-versa.  This may require the the caller to allocate fresh
+ * memory and/or move other context blocks in the signal frame.
+ *
+ * Changing the vector length during signal return is not permitted:
+ * sve_context.vl must equal the thread's current vector length when
+ * doing a sigreturn.
+ *
+ *
+ * Note: for all these macros, the "vq" argument denotes the SVE
+ * vector length in quadwords (i.e., units of 128 bits).
+ *
+ * The correct way to obtain vq is to use sve_vq_from_vl(vl).  The
+ * result is valid if and only if sve_vl_valid(vl) is true.  This is
+ * guaranteed for a struct sve_context written by the kernel.
+ *
+ *
+ * Additional macros describe the contents and layout of the payload.
+ * For each, SVE_SIG_x_OFFSET(args) is the start offset relative to
+ * the start of struct sve_context, and SVE_SIG_x_SIZE(args) is the
+ * size in bytes:
+ *
+ *	x	type				description
+ *	-	----				-----------
+ *	REGS					the entire SVE context
+ *
+ *	ZREGS	__uint128_t[SVE_NUM_ZREGS][vq]	all Z-registers
+ *	ZREG	__uint128_t[vq]			individual Z-register Zn
+ *
+ *	PREGS	uint16_t[SVE_NUM_PREGS][vq]	all P-registers
+ *	PREG	uint16_t[vq]			individual P-register Pn
+ *
+ *	FFR	uint16_t[vq]			first-fault status register
+ *
+ * Additional data might be appended in the future.
+ */
+
+#define SVE_SIG_ZREG_SIZE(vq)	((__u32)(vq) * 16)
+#define SVE_SIG_PREG_SIZE(vq)	((__u32)(vq) * 2)
+#define SVE_SIG_FFR_SIZE(vq)	SVE_SIG_PREG_SIZE(vq)
+
+#define SVE_SIG_REGS_OFFSET	((sizeof(struct sve_context) + 15) / 16 * 16)
+
+#define SVE_SIG_ZREGS_OFFSET	SVE_SIG_REGS_OFFSET
+#define SVE_SIG_ZREG_OFFSET(vq, n) \
+	(SVE_SIG_ZREGS_OFFSET + SVE_SIG_ZREG_SIZE(vq) * (n))
+#define SVE_SIG_ZREGS_SIZE(vq) \
+	(SVE_SIG_ZREG_OFFSET(vq, SVE_NUM_ZREGS) - SVE_SIG_ZREGS_OFFSET)
+
+#define SVE_SIG_PREGS_OFFSET(vq) \
+	(SVE_SIG_ZREGS_OFFSET + SVE_SIG_ZREGS_SIZE(vq))
+#define SVE_SIG_PREG_OFFSET(vq, n) \
+	(SVE_SIG_PREGS_OFFSET(vq) + SVE_SIG_PREG_SIZE(vq) * (n))
+#define SVE_SIG_PREGS_SIZE(vq) \
+	(SVE_SIG_PREG_OFFSET(vq, SVE_NUM_PREGS) - SVE_SIG_PREGS_OFFSET(vq))
+
+#define SVE_SIG_FFR_OFFSET(vq) \
+	(SVE_SIG_PREGS_OFFSET(vq) + SVE_SIG_PREGS_SIZE(vq))
+
+#define SVE_SIG_REGS_SIZE(vq) \
+	(SVE_SIG_FFR_OFFSET(vq) + SVE_SIG_FFR_SIZE(vq) - SVE_SIG_REGS_OFFSET)
+
+#define SVE_SIG_CONTEXT_SIZE(vq) (SVE_SIG_REGS_OFFSET + SVE_SIG_REGS_SIZE(vq))
+
+
 #endif /* _UAPI__ASM_SIGCONTEXT_H */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 14/27] arm64/sve: Backend logic for setting the vector length
       [not found] <1502280338-23002-1-git-send-email-Dave.Martin@arm.com>
  2017-08-09 12:06 ` [PATCH 09/27] arm64/sve: Signal frame and context structure definition Dave Martin
@ 2017-08-09 12:07 ` Dave Martin
  2017-08-23 15:33   ` Alex Bennée
  2017-08-09 12:08 ` [PATCH 18/27] arm64/sve: ptrace and ELF coredump support Dave Martin
  2 siblings, 1 reply; 10+ messages in thread
From: Dave Martin @ 2017-08-09 12:07 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, Szabolcs Nagy,
	Richard Sandiford, kvmarm, libc-alpha, linux-arch, gdb,
	Alan Hayward, Yao Qi

This patch implements the core logic for changing a task's vector
length on request from userspace.  This will be used by the ptrace
and prctl frontends that are implemented in later patches.

The SVE architecture permits, but does not require, implementations
to support vector lengths that are not a power of two.  To handle
this, logic is added to check a requested vector length against a
possibly sparse bitmap of available vector lengths at runtime, so
that the best supported value can be chosen.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/fpsimd.h |   6 +++
 arch/arm64/kernel/fpsimd.c      | 116 ++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/prctl.h      |   5 ++
 3 files changed, 127 insertions(+)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 7efd04e..39b26d2 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -70,11 +70,15 @@ extern void fpsimd_update_current_state(struct fpsimd_state *state);
 
 extern void fpsimd_flush_task_state(struct task_struct *target);
 
+#define SVE_VL_ARCH_MAX 0x100
+
 extern void sve_save_state(void *state, u32 *pfpsr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
 			   unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 
+extern int sve_max_vl;
+
 #ifdef CONFIG_ARM64_SVE
 
 extern size_t sve_state_size(struct task_struct const *task);
@@ -83,6 +87,8 @@ extern void sve_alloc(struct task_struct *task);
 extern void fpsimd_release_thread(struct task_struct *task);
 extern void fpsimd_dup_sve(struct task_struct *dst,
 			   struct task_struct const *src);
+extern int sve_set_vector_length(struct task_struct *task,
+				 unsigned long vl, unsigned long flags);
 
 #else /* ! CONFIG_ARM64_SVE */
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index e8674f6..bce95de 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -18,12 +18,14 @@
  */
 
 #include <linux/bottom_half.h>
+#include <linux/bitmap.h>
 #include <linux/cpu.h>
 #include <linux/cpu_pm.h>
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/percpu.h>
 #include <linux/preempt.h>
+#include <linux/prctl.h>
 #include <linux/ptrace.h>
 #include <linux/sched/signal.h>
 #include <linux/signal.h>
@@ -111,6 +113,20 @@ static DEFINE_PER_CPU(struct fpsimd_state *, fpsimd_last_state);
 /* Default VL for tasks that don't set it explicitly: */
 static int sve_default_vl = -1;
 
+#ifdef CONFIG_ARM64_SVE
+
+/* Maximum supported vector length across all CPUs (initially poisoned) */
+int sve_max_vl = -1;
+/* Set of available vector lengths, as vq_to_bit(vq): */
+static DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
+
+#else /* ! CONFIG_ARM64_SVE */
+
+/* Dummy declaration for code that will be optimised out: */
+extern DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
+
+#endif /* ! CONFIG_ARM64_SVE */
+
 static void sve_free(struct task_struct *task)
 {
 	kfree(task->thread.sve_state);
@@ -148,6 +164,37 @@ static void change_cpacr(u64 old, u64 new)
 		write_sysreg(new, CPACR_EL1);
 }
 
+static unsigned int vq_to_bit(unsigned int vq)
+{
+	BUILD_BUG_ON(vq < 1 || vq > SVE_VQ_MAX);
+
+	return SVE_VQ_MAX - vq;
+}
+
+static unsigned int bit_to_vq(unsigned int bit)
+{
+	BUILD_BUG_ON(bit >= SVE_VQ_MAX);
+
+	return SVE_VQ_MAX - bit;
+}
+
+static unsigned int find_supported_vector_length(unsigned int vl)
+{
+	int bit;
+
+	BUG_ON(!sve_vl_valid(vl));
+
+	BUG_ON(!sve_vl_valid(sve_max_vl));
+	if (vl > sve_max_vl)
+		vl = sve_max_vl;
+
+	bit = find_next_bit(sve_vq_map, SVE_VQ_MAX,
+			    vq_to_bit(sve_vq_from_vl(vl)));
+	BUG_ON(bit < 0 || bit >= SVE_VQ_MAX);
+
+	return 16 * bit_to_vq(bit);
+}
+
 #define ZREG(sve_state, vq, n) ((char *)(sve_state) +		\
 	(SVE_SIG_ZREG_OFFSET(vq, n) - SVE_SIG_REGS_OFFSET))
 
@@ -235,6 +282,73 @@ void fpsimd_dup_sve(struct task_struct *dst, struct task_struct const *src)
 	}
 }
 
+int sve_set_vector_length(struct task_struct *task,
+			  unsigned long vl, unsigned long flags)
+{
+	BUG_ON(task == current && preemptible());
+
+	if (flags & ~(unsigned long)(PR_SVE_VL_INHERIT |
+				     PR_SVE_SET_VL_ONEXEC))
+		return -EINVAL;
+
+	if (!sve_vl_valid(vl))
+		return -EINVAL;
+
+	/*
+	 * Clamp to the maximum vector length that VL-agnostic SVE code can
+	 * work with.  A flag may be assigned in the future to allow setting
+	 * of larger vector lengths without confusing older software.
+	 */
+	if (vl > SVE_VL_ARCH_MAX)
+		vl = SVE_VL_ARCH_MAX;
+
+	vl = find_supported_vector_length(vl);
+
+	if (flags & (PR_SVE_VL_INHERIT |
+		     PR_SVE_SET_VL_ONEXEC))
+		task->thread.sve_vl_onexec = vl;
+	else
+		/* Reset VL to system default on next exec: */
+		task->thread.sve_vl_onexec = 0;
+
+	/* Only actually set the VL if not deferred: */
+	if (flags & PR_SVE_SET_VL_ONEXEC)
+		goto out;
+
+	/*
+	 * To ensure the FPSIMD bits of the SVE vector registers are preserved,
+	 * write any live register state back to task_struct, and convert to a
+	 * non-SVE thread.
+	 */
+	if (vl != task->thread.sve_vl) {
+		if (task == current) {
+			task_fpsimd_save();
+			set_thread_flag(TIF_FOREIGN_FPSTATE);
+		}
+
+		if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
+			sve_to_fpsimd(task);
+
+		/*
+		 * Force reallocation of task SVE state to the correct size
+		 * on next use:
+		 */
+		sve_free(task);
+	}
+
+	task->thread.sve_vl = vl;
+
+	fpsimd_flush_task_state(task);
+
+out:
+	if (flags & PR_SVE_VL_INHERIT)
+		set_thread_flag(TIF_SVE_VL_INHERIT);
+	else
+		clear_thread_flag(TIF_SVE_VL_INHERIT);
+
+	return 0;
+}
+
 void fpsimd_release_thread(struct task_struct *dead_task)
 {
 	sve_free(dead_task);
@@ -407,6 +521,8 @@ void fpsimd_flush_thread(void)
 		 * If not, something went badly wrong.
 		 */
 		BUG_ON(!sve_vl_valid(current->thread.sve_vl));
+		BUG_ON(find_supported_vector_length(current->thread.sve_vl) !=
+		       current->thread.sve_vl);
 
 		/*
 		 * If the task is not set to inherit, ensure that the vector
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a8d0759..1b64901 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -197,4 +197,9 @@ struct prctl_mm_map {
 # define PR_CAP_AMBIENT_LOWER		3
 # define PR_CAP_AMBIENT_CLEAR_ALL	4
 
+/* arm64 Scalable Vector Extension controls */
+# define PR_SVE_SET_VL_ONEXEC		(1 << 18) /* defer effect until exec */
+# define PR_SVE_VL_LEN_MASK		0xffff
+# define PR_SVE_VL_INHERIT		(1 << 17) /* inherit across exec */
+
 #endif /* _LINUX_PRCTL_H */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 18/27] arm64/sve: ptrace and ELF coredump support
       [not found] <1502280338-23002-1-git-send-email-Dave.Martin@arm.com>
  2017-08-09 12:06 ` [PATCH 09/27] arm64/sve: Signal frame and context structure definition Dave Martin
  2017-08-09 12:07 ` [PATCH 14/27] arm64/sve: Backend logic for setting the vector length Dave Martin
@ 2017-08-09 12:08 ` Dave Martin
  2 siblings, 0 replies; 10+ messages in thread
From: Dave Martin @ 2017-08-09 12:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, Szabolcs Nagy,
	Richard Sandiford, kvmarm, libc-alpha, linux-arch, gdb,
	Alan Hayward, Yao Qi, Oleg Nesterov, Alexander Viro

This patch defines and implements a new regset NT_ARM_SVE, which
describes a thread's SVE register state.  This allows a debugger to
manipulate the SVE state, as well as being included in ELF
coredumps for post-mortem debugging.

Because the regset size and layout are dependent on the thread's
current vector length, it is not possible to define a C struct to
describe the regset contents as is done for existing regsets.
Instead, and for the same reasons, NT_ARM_SVE is based on the
freeform variable-layout approach used for the SVE signal frame.

Additionally, to reduce debug overhead when debugging threads that
might or might not have live SVE register state, NT_ARM_SVE may be
presented in one of two different formats: the old struct
user_fpsimd_state format is embedded for describing the state of a
thread with no live SVE state, whereas a new variable-layout
structure is embedded for describing live SVE state.  This avoids a
debugger needing to poll NT_PRFPREG in addition to NT_ARM_SVE, and
allows existing userspace code to handle the non-SVE case without
too much modification.

For this to work, NT_ARM_SVE is defined with a fixed-format header
of type struct user_sve_header, which the recipient can use to
figure out the content, size and layout of the reset of the regset.
Accessor macros are defined to allow the vector-length-dependent
parts of the regset to be manipulated.

Signed-off-by: Alan Hayward <alan.hayward@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/fpsimd.h      |  13 +-
 arch/arm64/include/uapi/asm/ptrace.h | 130 ++++++++++++++++
 arch/arm64/kernel/fpsimd.c           |  34 +++++
 arch/arm64/kernel/ptrace.c           | 288 ++++++++++++++++++++++++++++++++++-
 include/uapi/linux/elf.h             |   1 +
 5 files changed, 457 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index f43f573..ffb8a50 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -37,13 +37,16 @@ struct fpsimd_state {
 			__uint128_t vregs[32];
 			u32 fpsr;
 			u32 fpcr;
+			/*
+			 * For ptrace compatibility, pad to next 128-bit
+			 * boundary here if extending this struct.
+			 */
 		};
 	};
 	/* the id of the last cpu to have restored this state */
 	unsigned int cpu;
 };
 
-
 #if defined(__KERNEL__) && defined(CONFIG_COMPAT)
 /* Masks for extracting the FPSR and FPCR from the FPSCR */
 #define VFP_FPSCR_STAT_MASK	0xf800009f
@@ -87,6 +90,10 @@ extern void sve_alloc(struct task_struct *task);
 extern void fpsimd_release_thread(struct task_struct *task);
 extern void fpsimd_dup_sve(struct task_struct *dst,
 			   struct task_struct const *src);
+extern void fpsimd_sync_to_sve(struct task_struct *task);
+extern void sve_sync_to_fpsimd(struct task_struct *task);
+extern void sve_sync_from_fpsimd_zeropad(struct task_struct *task);
+
 extern int sve_set_vector_length(struct task_struct *task,
 				 unsigned long vl, unsigned long flags);
 
@@ -101,6 +108,10 @@ static void __maybe_unused sve_alloc(struct task_struct *task) { }
 static void __maybe_unused fpsimd_release_thread(struct task_struct *task) { }
 static void __maybe_unused fpsimd_dup_sve(struct task_struct *dst,
 					  struct task_struct const *src) { }
+static void __maybe_unused sve_sync_to_fpsimd(struct task_struct *task) { }
+static void __maybe_unused sve_sync_from_fpsimd_zeropad(
+	struct task_struct *task) { }
+
 static void __maybe_unused sve_init_vq_map(void) { }
 static void __maybe_unused sve_update_vq_map(void) { }
 static int __maybe_unused sve_verify_vq_map(void) { return 0; }
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index d1ff83d..16d9344 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -22,6 +22,7 @@
 #include <linux/types.h>
 
 #include <asm/hwcap.h>
+#include <asm/sigcontext.h>
 
 
 /*
@@ -63,6 +64,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/prctl.h>
+
 /*
  * User structures for general purpose, floating point and debug registers.
  */
@@ -90,6 +93,133 @@ struct user_hwdebug_state {
 	}		dbg_regs[16];
 };
 
+/* SVE/FP/SIMD state (NT_ARM_SVE) */
+
+struct user_sve_header {
+	__u32 size; /* total meaningful regset content in bytes */
+	__u32 max_size; /* maxmium possible size for this thread */
+	__u16 vl; /* current vector length */
+	__u16 max_vl; /* maximum possible vector length */
+	__u16 flags;
+	__u16 __reserved;
+};
+
+/* Definitions for user_sve_header.flags: */
+#define SVE_PT_REGS_MASK		(1 << 0)
+
+/* Flags: must be kept in sync with prctl interface in <linux/ptrace.h> */
+#define SVE_PT_REGS_FPSIMD		0
+#define SVE_PT_REGS_SVE			SVE_PT_REGS_MASK
+
+#define SVE_PT_VL_INHERIT		(PR_SVE_VL_INHERIT >> 16)
+#define SVE_PT_VL_ONEXEC		(PR_SVE_SET_VL_ONEXEC >> 16)
+
+
+/*
+ * The remainder of the SVE state follows struct user_sve_header.  The
+ * total size of the SVE state (including header) depends on the
+ * metadata in the header:  SVE_PT_SIZE(vq, flags) gives the total size
+ * of the state in bytes, including the header.
+ *
+ * Refer to <asm/sigcontext.h> for details of how to pass the correct
+ * "vq" argument to these macros.
+ */
+
+/* Offset from the start of struct user_sve_header to the register data */
+#define SVE_PT_REGS_OFFSET	((sizeof(struct sve_context) + 15) / 16 * 16)
+
+/*
+ * The register data content and layout depends on the value of the
+ * flags field.
+ */
+
+/*
+ * (flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD case:
+ *
+ * The payload starts at offset SVE_PT_FPSIMD_OFFSET, and is of type
+ * struct user_fpsimd_state.  Additional data might be appended in the
+ * future: use SVE_PT_FPSIMD_SIZE(vq, flags) to compute the total size.
+ * SVE_PT_FPSIMD_SIZE(vq, flags) will never be less than
+ * sizeof(struct user_fpsimd_state).
+ */
+
+#define SVE_PT_FPSIMD_OFFSET		SVE_PT_REGS_OFFSET
+
+#define SVE_PT_FPSIMD_SIZE(vq, flags)	(sizeof(struct user_fpsimd_state))
+
+/*
+ * (flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE case:
+ *
+ * The payload starts at offset SVE_PT_SVE_OFFSET, and is of size
+ * SVE_PT_SVE_SIZE(vq, flags).
+ *
+ * Additional macros describe the contents and layout of the payload.
+ * For each, SVE_PT_SVE_x_OFFSET(args) is the start offset relative to
+ * the start of struct user_sve_header, and SVE_PT_SVE_x_SIZE(args) is
+ * the size in bytes:
+ *
+ *	x	type				description
+ *	-	----				-----------
+ *	ZREGS		\
+ *	ZREG		|
+ *	PREGS		| refer to <asm/sigcontext.h>
+ *	PREG		|
+ *	FFR		/
+ *
+ *	FPSR	uint32_t			FPSR
+ *	FPCR	uint32_t			FPCR
+ *
+ * Additional data might be appended in the future.
+ */
+
+#define SVE_PT_SVE_ZREG_SIZE(vq)	SVE_SIG_ZREG_SIZE(vq)
+#define SVE_PT_SVE_PREG_SIZE(vq)	SVE_SIG_PREG_SIZE(vq)
+#define SVE_PT_SVE_FFR_SIZE(vq)		SVE_SIG_FFR_SIZE(vq)
+#define SVE_PT_SVE_FPSR_SIZE		sizeof(__u32)
+#define SVE_PT_SVE_FPCR_SIZE		sizeof(__u32)
+
+#define __SVE_SIG_TO_PT(offset) \
+	((offset) - SVE_SIG_REGS_OFFSET + SVE_PT_REGS_OFFSET)
+
+#define SVE_PT_SVE_OFFSET		SVE_PT_REGS_OFFSET
+
+#define SVE_PT_SVE_ZREGS_OFFSET \
+	__SVE_SIG_TO_PT(SVE_SIG_ZREGS_OFFSET)
+#define SVE_PT_SVE_ZREG_OFFSET(vq, n) \
+	__SVE_SIG_TO_PT(SVE_SIG_ZREG_OFFSET(vq, n))
+#define SVE_PT_SVE_ZREGS_SIZE(vq) \
+	(SVE_PT_SVE_ZREG_OFFSET(vq, SVE_NUM_ZREGS) - SVE_PT_SVE_ZREGS_OFFSET)
+
+#define SVE_PT_SVE_PREGS_OFFSET(vq) \
+	__SVE_SIG_TO_PT(SVE_SIG_PREGS_OFFSET(vq))
+#define SVE_PT_SVE_PREG_OFFSET(vq, n) \
+	__SVE_SIG_TO_PT(SVE_SIG_PREG_OFFSET(vq, n))
+#define SVE_PT_SVE_PREGS_SIZE(vq) \
+	(SVE_PT_SVE_PREG_OFFSET(vq, SVE_NUM_PREGS) - \
+		SVE_PT_SVE_PREGS_OFFSET(vq))
+
+#define SVE_PT_SVE_FFR_OFFSET(vq) \
+	__SVE_SIG_TO_PT(SVE_SIG_FFR_OFFSET(vq))
+
+#define SVE_PT_SVE_FPSR_OFFSET(vq) \
+	((SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq) + 15) / 16 * 16)
+#define SVE_PT_SVE_FPCR_OFFSET(vq) \
+	(SVE_PT_SVE_FPSR_OFFSET(vq) + SVE_PT_SVE_FPSR_SIZE)
+
+/*
+ * Any future extension appended after FPCR must be aligned to the next
+ * 128-bit boundary.
+ */
+
+#define SVE_PT_SVE_SIZE(vq, flags)				\
+	((SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE -	\
+		SVE_PT_SVE_OFFSET + 15) / 16 * 16)
+
+#define SVE_PT_SIZE(vq, flags)						\
+	 (((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE ?		\
+		  SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, flags)	\
+		: SVE_PT_FPSIMD_OFFSET + SVE_PT_FPSIMD_SIZE(vq, flags))
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _UAPI__ASM_PTRACE_H */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index c727b47..5a7c870 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -268,6 +268,40 @@ void sve_alloc(struct task_struct *task)
 	BUG_ON(!task->thread.sve_state);
 }
 
+void fpsimd_sync_to_sve(struct task_struct *task)
+{
+	if (!test_tsk_thread_flag(task, TIF_SVE))
+		fpsimd_to_sve(task);
+}
+
+void sve_sync_to_fpsimd(struct task_struct *task)
+{
+	if (test_tsk_thread_flag(task, TIF_SVE))
+		sve_to_fpsimd(task);
+}
+
+void sve_sync_from_fpsimd_zeropad(struct task_struct *task)
+{
+	unsigned int vl, vq;
+	void *sst = task->thread.sve_state;
+	struct fpsimd_state const *fst = &task->thread.fpsimd_state;
+	unsigned int i;
+
+	if (!test_tsk_thread_flag(task, TIF_SVE))
+		return;
+
+	vl = task->thread.sve_vl;
+
+	BUG_ON(!sve_vl_valid(vl));
+	vq = sve_vq_from_vl(vl);
+
+	memset(sst, 0, SVE_SIG_REGS_SIZE(vq));
+
+	for (i = 0; i < 32; ++i)
+		memcpy(ZREG(sst, vq, i), &fst->vregs[i],
+		       sizeof(fst->vregs[i]));
+}
+
 /*
  * Handle SVE state across fork():
  *
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 28619b5..abc05ed 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -32,6 +32,7 @@
 #include <linux/security.h>
 #include <linux/init.h>
 #include <linux/signal.h>
+#include <linux/string.h>
 #include <linux/uaccess.h>
 #include <linux/perf_event.h>
 #include <linux/hw_breakpoint.h>
@@ -40,6 +41,7 @@
 #include <linux/elf.h>
 
 #include <asm/compat.h>
+#include <asm/cpufeature.h>
 #include <asm/debug-monitors.h>
 #include <asm/pgtable.h>
 #include <asm/syscall.h>
@@ -617,33 +619,66 @@ static int gpr_set(struct task_struct *target, const struct user_regset *regset,
 /*
  * TODO: update fp accessors for lazy context switching (sync/flush hwstate)
  */
-static int fpr_get(struct task_struct *target, const struct user_regset *regset,
-		   unsigned int pos, unsigned int count,
-		   void *kbuf, void __user *ubuf)
+static int __fpr_get(struct task_struct *target,
+		     const struct user_regset *regset,
+		     unsigned int pos, unsigned int count,
+		     void *kbuf, void __user *ubuf, unsigned int start_pos)
 {
 	struct user_fpsimd_state *uregs;
+
+	sve_sync_to_fpsimd(target);
+
 	uregs = &target->thread.fpsimd_state.user_fpsimd;
 
+	return user_regset_copyout(&pos, &count, &kbuf, &ubuf, uregs,
+				   start_pos, start_pos + sizeof(*uregs));
+}
+
+static int fpr_get(struct task_struct *target, const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   void *kbuf, void __user *ubuf)
+{
 	if (target == current)
 		fpsimd_preserve_current_state();
 
-	return user_regset_copyout(&pos, &count, &kbuf, &ubuf, uregs, 0, -1);
+	return __fpr_get(target, regset, pos, count, kbuf, ubuf, 0);
 }
 
-static int fpr_set(struct task_struct *target, const struct user_regset *regset,
-		   unsigned int pos, unsigned int count,
-		   const void *kbuf, const void __user *ubuf)
+static int __fpr_set(struct task_struct *target,
+		     const struct user_regset *regset,
+		     unsigned int pos, unsigned int count,
+		     const void *kbuf, const void __user *ubuf,
+		     unsigned int start_pos)
 {
 	int ret;
 	struct user_fpsimd_state newstate =
 		target->thread.fpsimd_state.user_fpsimd;
 
-	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate, 0, -1);
+	sve_sync_to_fpsimd(target);
+
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate,
+				 start_pos, start_pos + sizeof(newstate));
 	if (ret)
 		return ret;
 
 	target->thread.fpsimd_state.user_fpsimd = newstate;
+
+	return ret;
+}
+
+static int fpr_set(struct task_struct *target, const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   const void *kbuf, const void __user *ubuf)
+{
+	int ret;
+
+	ret = __fpr_set(target, regset, pos, count, kbuf, ubuf, 0);
+	if (ret)
+		return ret;
+
+	sve_sync_from_fpsimd_zeropad(target);
 	fpsimd_flush_task_state(target);
+
 	return ret;
 }
 
@@ -701,6 +736,229 @@ static int system_call_set(struct task_struct *target,
 	return ret;
 }
 
+#ifdef CONFIG_ARM64_SVE
+
+static void sve_init_header_from_task(struct user_sve_header *header,
+				      struct task_struct *target)
+{
+	unsigned int vq;
+
+	memset(header, 0, sizeof(*header));
+
+	header->flags = test_tsk_thread_flag(target, TIF_SVE) ?
+		SVE_PT_REGS_SVE : SVE_PT_REGS_FPSIMD;
+	if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT))
+		header->flags |= SVE_PT_VL_INHERIT;
+
+	header->vl = target->thread.sve_vl;
+
+	BUG_ON(!sve_vl_valid(header->vl));
+	vq = sve_vq_from_vl(header->vl);
+
+	BUG_ON(!sve_vl_valid(sve_max_vl));
+	header->max_vl = sve_max_vl;
+
+	header->size = SVE_PT_SIZE(vq, header->flags);
+	header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl),
+				      SVE_PT_REGS_SVE);
+}
+
+static unsigned int sve_size_from_header(struct user_sve_header const *header)
+{
+	return (header->size + 15) / 16 * 16;
+}
+
+static unsigned int sve_get_size(struct task_struct *target,
+				 const struct user_regset *regset)
+{
+	struct user_sve_header header;
+
+	if (!system_supports_sve())
+		return 0;
+
+	sve_init_header_from_task(&header, target);
+	return sve_size_from_header(&header);
+}
+
+static int sve_get(struct task_struct *target,
+		   const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   void *kbuf, void __user *ubuf)
+{
+	int ret;
+	struct user_sve_header header;
+	unsigned int vq;
+	unsigned long start, end;
+
+	if (!system_supports_sve())
+		return -EINVAL;
+
+	/* Header */
+	sve_init_header_from_task(&header, target);
+
+	BUG_ON(!sve_vl_valid(header.vl));
+	vq = sve_vq_from_vl(header.vl);
+
+	ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &header,
+				  0, sizeof(header));
+	if (ret)
+		return ret;
+
+	if (target == current)
+		fpsimd_preserve_current_state();
+
+	/* Registers: FPSIMD-only case */
+
+	BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header));
+
+	if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD)
+		return __fpr_get(target, regset, pos, count, kbuf, ubuf,
+				 SVE_PT_FPSIMD_OFFSET);
+
+	/* Otherwise: full SVE case */
+
+	BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
+
+	start = SVE_PT_SVE_OFFSET;
+	end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq);
+
+	BUG_ON(end < start);
+	BUG_ON(end - start > sve_state_size(target));
+	ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+				  target->thread.sve_state,
+				  start, end);
+	if (ret)
+		return ret;
+
+	start = end;
+	end = SVE_PT_SVE_FPSR_OFFSET(vq);
+
+	BUG_ON(end < start);
+	ret = user_regset_copyout_zero(&pos, &count, &kbuf, &ubuf,
+				       start, end);
+	if (ret)
+		return ret;
+
+	start = end;
+	end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE;
+
+	BUG_ON((char *)(&target->thread.fpsimd_state.fpcr + 1) <
+	       (char *)&target->thread.fpsimd_state.fpsr);
+	BUG_ON(end < start);
+	BUG_ON((char *)(&target->thread.fpsimd_state.fpcr + 1) -
+	       (char *)&target->thread.fpsimd_state.fpsr !=
+		end - start);
+
+	ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+				  &target->thread.fpsimd_state.fpsr,
+				  start, end);
+	if (ret)
+		return ret;
+
+	start = end;
+	end = sve_size_from_header(&header);
+	BUG_ON(end < start);
+
+	return user_regset_copyout_zero(&pos, &count, &kbuf, &ubuf,
+					start, end);
+}
+
+static int sve_set(struct task_struct *target,
+		   const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   const void *kbuf, const void __user *ubuf)
+{
+	int ret;
+	struct user_sve_header header;
+	unsigned int vq;
+	unsigned long start, end;
+
+	if (!system_supports_sve())
+		return -EINVAL;
+
+	/* Header */
+	if (count < sizeof(header))
+		return -EINVAL;
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &header,
+				 0, sizeof(header));
+	if (ret)
+		goto out;
+
+	/*
+	 * Apart from PT_SVE_REGS_MASK, all PT_SVE_* flags are consumed by
+	 * sve_set_vector_length(), which will also validate them for us:
+	 */
+	ret = sve_set_vector_length(target, header.vl,
+				    header.flags & ~SVE_PT_REGS_MASK);
+	if (ret)
+		goto out;
+
+	/* Actual VL set may be less than the user asked for: */
+	BUG_ON(!sve_vl_valid(target->thread.sve_vl));
+	vq = sve_vq_from_vl(target->thread.sve_vl);
+
+	/* Registers: FPSIMD-only case */
+
+	BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header));
+
+	if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD) {
+		sve_sync_to_fpsimd(target);
+
+		ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
+				SVE_PT_FPSIMD_OFFSET);
+		clear_tsk_thread_flag(target, TIF_SVE);
+		goto out;
+	}
+
+	/* Otherwise: full SVE case */
+
+	sve_alloc(target);
+	fpsimd_sync_to_sve(target);
+	set_tsk_thread_flag(target, TIF_SVE);
+
+	BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
+
+	start = SVE_PT_SVE_OFFSET;
+	end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq);
+
+	BUG_ON(end < start);
+	BUG_ON(end - start > sve_state_size(target));
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+				 target->thread.sve_state,
+				 start, end);
+	if (ret)
+		goto out;
+
+	start = end;
+	end = SVE_PT_SVE_FPSR_OFFSET(vq);
+
+	BUG_ON(end < start);
+	ret = user_regset_copyin_ignore(&pos, &count, &kbuf, &ubuf,
+					start, end);
+	if (ret)
+		goto out;
+
+	start = end;
+	end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE;
+
+	BUG_ON((char *)(&target->thread.fpsimd_state.fpcr + 1) <
+		(char *)&target->thread.fpsimd_state.fpsr);
+	BUG_ON(end < start);
+	BUG_ON((char *)(&target->thread.fpsimd_state.fpcr + 1) -
+	       (char *)&target->thread.fpsimd_state.fpsr !=
+		end - start);
+
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+				 &target->thread.fpsimd_state.fpsr,
+				 start, end);
+
+out:
+	fpsimd_flush_task_state(target);
+	return ret;
+}
+
+#endif /* CONFIG_ARM64_SVE */
+
 enum aarch64_regset {
 	REGSET_GPR,
 	REGSET_FPR,
@@ -710,6 +968,9 @@ enum aarch64_regset {
 	REGSET_HW_WATCH,
 #endif
 	REGSET_SYSTEM_CALL,
+#ifdef CONFIG_ARM64_SVE
+	REGSET_SVE,
+#endif
 };
 
 static const struct user_regset aarch64_regsets[] = {
@@ -767,6 +1028,17 @@ static const struct user_regset aarch64_regsets[] = {
 		.get = system_call_get,
 		.set = system_call_set,
 	},
+#ifdef CONFIG_ARM64_SVE
+	[REGSET_SVE] = { /* Scalable Vector Extension */
+		.core_note_type = NT_ARM_SVE,
+		.n = (SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE) + 15) / 16,
+		.size = 16,
+		.align = 16,
+		.get = sve_get,
+		.set = sve_set,
+		.get_size = sve_get_size,
+	},
+#endif
 };
 
 static const struct user_regset_view user_aarch64_view = {
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index b5280db..735b8f4 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -416,6 +416,7 @@ typedef struct elf64_shdr {
 #define NT_ARM_HW_BREAK	0x402		/* ARM hardware breakpoint registers */
 #define NT_ARM_HW_WATCH	0x403		/* ARM hardware watchpoint registers */
 #define NT_ARM_SYSTEM_CALL	0x404	/* ARM system call number */
+#define NT_ARM_SVE	0x405		/* ARM Scalable Vector Extension registers */
 #define NT_METAG_CBUF	0x500		/* Metag catch buffer registers */
 #define NT_METAG_RPIPE	0x501		/* Metag read pipeline state */
 #define NT_METAG_TLS	0x502		/* Metag TLS pointer */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 09/27] arm64/sve: Signal frame and context structure definition
  2017-08-09 12:06 ` [PATCH 09/27] arm64/sve: Signal frame and context structure definition Dave Martin
@ 2017-08-22 10:22   ` Alex Bennée
  2017-08-22 11:17     ` Dave Martin
  0 siblings, 1 reply; 10+ messages in thread
From: Alex Bennée @ 2017-08-22 10:22 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arm-kernel, linux-arch, libc-alpha, gdb, Ard Biesheuvel,
	Szabolcs Nagy, Catalin Marinas, Yao Qi, Alan Hayward,
	Will Deacon, Richard Sandiford, kvmarm


Dave Martin <Dave.Martin@arm.com> writes:

> This patch defines the representation that will be used for the SVE
> register state in the signal frame, and implements support for
> saving and restoring the SVE registers around signals.
>
> The same layout will also be used for the in-kernel task state.
>
> Due to the variability of the SVE vector length, it is not possible
> to define a fixed C struct to describe all the registers.  Instead,
> Macros are defined in sigcontext.h to facilitate access to the
> parts of the structure.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> ---
>  arch/arm64/include/uapi/asm/sigcontext.h | 113 ++++++++++++++++++++++++++++++-
>  1 file changed, 112 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h
> index f0a76b9..0533bdf 100644
> --- a/arch/arm64/include/uapi/asm/sigcontext.h
> +++ b/arch/arm64/include/uapi/asm/sigcontext.h
> @@ -16,6 +16,8 @@
>  #ifndef _UAPI__ASM_SIGCONTEXT_H
>  #define _UAPI__ASM_SIGCONTEXT_H
>
> +#ifndef __ASSEMBLY__
> +
>  #include <linux/types.h>
>
>  /*
> @@ -41,10 +43,11 @@ struct sigcontext {
>   *
>   *	0x210		fpsimd_context
>   *	 0x10		esr_context
> + *	0x8a0		sve_context (vl <= 64) (optional)
>   *	 0x20		extra_context (optional)
>   *	 0x10		terminator (null _aarch64_ctx)
>   *
> - *	0xdb0		(reserved for future allocation)
> + *	0x510		(reserved for future allocation)
>   *
>   * New records that can exceed this space need to be opt-in for userspace, so
>   * that an expanded signal frame is not generated unexpectedly.  The mechanism
> @@ -116,4 +119,112 @@ struct extra_context {
>  	__u32 __reserved[3];
>  };
>
> +#define SVE_MAGIC	0x53564501
> +
> +struct sve_context {
> +	struct _aarch64_ctx head;
> +	__u16 vl;
> +	__u16 __reserved[3];
> +};
> +
> +#endif /* !__ASSEMBLY__ */
> +
> +/*
> + * The SVE architecture leaves space for future expansion of the
> + * vector length beyond its initial architectural limit of 2048 bits
> + * (16 quadwords).
> + */
> +#define SVE_VQ_MIN		1
> +#define SVE_VQ_MAX		0x200
> +
> +#define SVE_VL_MIN		(SVE_VQ_MIN * 0x10)
> +#define SVE_VL_MAX		(SVE_VQ_MAX * 0x10)
> +
> +#define SVE_NUM_ZREGS		32
> +#define SVE_NUM_PREGS		16
> +
> +#define sve_vl_valid(vl) \
> +	((vl) % 0x10 == 0 && (vl) >= SVE_VL_MIN && (vl) <= SVE_VL_MAX)
> +#define sve_vq_from_vl(vl)	((vl) / 0x10)
> +#define sve_vl_from_vq(vq)	((vq) * 0x10)

I got a little confused first time through over what VQ and VL where.
Maybe it would make sense to expand a little more from first principles?

  /*
   * The SVE architecture defines vector registers as a multiple of 128
   * bit quadwords. The current architectural limit is 2048 bits (16
   * quadwords) but there is room for future expansion beyond that.
    */

  #define SVE_VQ_BITS             128      /* 128 bits in one quadword */
  #define SVE_VQ_BYTES            (SVE_VQ_BITS / 8)

  #define SVE_VQ_MIN		1
  #define SVE_VQ_MAX		0x200    /* see ZCR_ELx[8:0] */

  #define SVE_VL_MIN_BYTES	(SVE_VQ_MIN * SVE_VQ_BYTES)
  #define SVE_VL_MAX_BYTES	(SVE_VQ_MAX * SVE_VQ_BYTES)

  #define SVE_NUM_ZREGS		32
  #define SVE_NUM_PREGS		16

  #define sve_vl_valid(vl)						\
          ((vl) % SVE_VQ_BYTES == 0 && (vl) >= SVE_VL_MIN && (vl) <= SVE_VL_MAX)
  #define sve_vq_from_vl(vl)	((vl) / SVE_VQ_BYTES)
  #define sve_vl_from_vq(vq)	((vq) * SVE_VQ_BYTES)


> +
> +/*
> + * If the SVE registers are currently live for the thread at signal delivery,
> + * sve_context.head.size >=
> + *	SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl))
> + * and the register data may be accessed using the SVE_SIG_*() macros.
> + *
> + * If sve_context.head.size <
> + *	SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)),
> + * the SVE registers were not live for the thread and no register data
> + * is included: in this case, the SVE_SIG_*() macros should not be
> + * used except for this check.
> + *
> + * The same convention applies when returning from a signal: a caller
> + * will need to remove or resize the sve_context block if it wants to
> + * make the SVE registers live when they were previously non-live or
> + * vice-versa.  This may require the the caller to allocate fresh
> + * memory and/or move other context blocks in the signal frame.
> + *
> + * Changing the vector length during signal return is not permitted:
> + * sve_context.vl must equal the thread's current vector length when
> + * doing a sigreturn.
> + *
> + *
> + * Note: for all these macros, the "vq" argument denotes the SVE
> + * vector length in quadwords (i.e., units of 128 bits).
> + *
> + * The correct way to obtain vq is to use sve_vq_from_vl(vl).  The
> + * result is valid if and only if sve_vl_valid(vl) is true.  This is
> + * guaranteed for a struct sve_context written by the kernel.
> + *
> + *
> + * Additional macros describe the contents and layout of the payload.
> + * For each, SVE_SIG_x_OFFSET(args) is the start offset relative to
> + * the start of struct sve_context, and SVE_SIG_x_SIZE(args) is the
> + * size in bytes:
> + *
> + *	x	type				description
> + *	-	----				-----------
> + *	REGS					the entire SVE context
> + *
> + *	ZREGS	__uint128_t[SVE_NUM_ZREGS][vq]	all Z-registers
> + *	ZREG	__uint128_t[vq]			individual Z-register Zn
> + *
> + *	PREGS	uint16_t[SVE_NUM_PREGS][vq]	all P-registers
> + *	PREG	uint16_t[vq]			individual P-register Pn
> + *
> + *	FFR	uint16_t[vq]			first-fault status register
> + *
> + * Additional data might be appended in the future.
> + */
> +
> +#define SVE_SIG_ZREG_SIZE(vq)	((__u32)(vq) * 16)
> +#define SVE_SIG_PREG_SIZE(vq)	((__u32)(vq) * 2)
> +#define SVE_SIG_FFR_SIZE(vq)	SVE_SIG_PREG_SIZE(vq)
> +
> +#define SVE_SIG_REGS_OFFSET	((sizeof(struct sve_context) + 15) / 16 * 16)
> +
> +#define SVE_SIG_ZREGS_OFFSET	SVE_SIG_REGS_OFFSET
> +#define SVE_SIG_ZREG_OFFSET(vq, n) \
> +	(SVE_SIG_ZREGS_OFFSET + SVE_SIG_ZREG_SIZE(vq) * (n))
> +#define SVE_SIG_ZREGS_SIZE(vq) \
> +	(SVE_SIG_ZREG_OFFSET(vq, SVE_NUM_ZREGS) - SVE_SIG_ZREGS_OFFSET)
> +
> +#define SVE_SIG_PREGS_OFFSET(vq) \
> +	(SVE_SIG_ZREGS_OFFSET + SVE_SIG_ZREGS_SIZE(vq))
> +#define SVE_SIG_PREG_OFFSET(vq, n) \
> +	(SVE_SIG_PREGS_OFFSET(vq) + SVE_SIG_PREG_SIZE(vq) * (n))
> +#define SVE_SIG_PREGS_SIZE(vq) \
> +	(SVE_SIG_PREG_OFFSET(vq, SVE_NUM_PREGS) - SVE_SIG_PREGS_OFFSET(vq))
> +
> +#define SVE_SIG_FFR_OFFSET(vq) \
> +	(SVE_SIG_PREGS_OFFSET(vq) + SVE_SIG_PREGS_SIZE(vq))
> +
> +#define SVE_SIG_REGS_SIZE(vq) \
> +	(SVE_SIG_FFR_OFFSET(vq) + SVE_SIG_FFR_SIZE(vq) - SVE_SIG_REGS_OFFSET)
> +
> +#define SVE_SIG_CONTEXT_SIZE(vq) (SVE_SIG_REGS_OFFSET + SVE_SIG_REGS_SIZE(vq))
> +
> +
>  #endif /* _UAPI__ASM_SIGCONTEXT_H */


--
Alex Bennée


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 09/27] arm64/sve: Signal frame and context structure definition
  2017-08-22 10:22   ` Alex Bennée
@ 2017-08-22 11:17     ` Dave Martin
  2017-08-22 13:53       ` Alex Bennée
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Martin @ 2017-08-22 11:17 UTC (permalink / raw)
  To: Alex Bennée
  Cc: linux-arch, libc-alpha, Ard Biesheuvel, Szabolcs Nagy, gdb,
	Yao Qi, Alan Hayward, Will Deacon, Richard Sandiford,
	Catalin Marinas, kvmarm, linux-arm-kernel

On Tue, Aug 22, 2017 at 11:22:44AM +0100, Alex Bennée wrote:
> 
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > This patch defines the representation that will be used for the SVE
> > register state in the signal frame, and implements support for
> > saving and restoring the SVE registers around signals.
> >
> > The same layout will also be used for the in-kernel task state.
> >
> > Due to the variability of the SVE vector length, it is not possible
> > to define a fixed C struct to describe all the registers.  Instead,
> > Macros are defined in sigcontext.h to facilitate access to the
> > parts of the structure.
> >
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > ---
> >  arch/arm64/include/uapi/asm/sigcontext.h | 113 ++++++++++++++++++++++++++++++-
> >  1 file changed, 112 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h
> > index f0a76b9..0533bdf 100644
> > --- a/arch/arm64/include/uapi/asm/sigcontext.h
> > +++ b/arch/arm64/include/uapi/asm/sigcontext.h
> > @@ -16,6 +16,8 @@
> >  #ifndef _UAPI__ASM_SIGCONTEXT_H
> >  #define _UAPI__ASM_SIGCONTEXT_H
> >
> > +#ifndef __ASSEMBLY__
> > +
> >  #include <linux/types.h>
> >
> >  /*
> > @@ -41,10 +43,11 @@ struct sigcontext {
> >   *
> >   *	0x210		fpsimd_context
> >   *	 0x10		esr_context
> > + *	0x8a0		sve_context (vl <= 64) (optional)
> >   *	 0x20		extra_context (optional)
> >   *	 0x10		terminator (null _aarch64_ctx)
> >   *
> > - *	0xdb0		(reserved for future allocation)
> > + *	0x510		(reserved for future allocation)
> >   *
> >   * New records that can exceed this space need to be opt-in for userspace, so
> >   * that an expanded signal frame is not generated unexpectedly.  The mechanism
> > @@ -116,4 +119,112 @@ struct extra_context {
> >  	__u32 __reserved[3];
> >  };
> >
> > +#define SVE_MAGIC	0x53564501
> > +
> > +struct sve_context {
> > +	struct _aarch64_ctx head;
> > +	__u16 vl;
> > +	__u16 __reserved[3];
> > +};
> > +
> > +#endif /* !__ASSEMBLY__ */
> > +
> > +/*
> > + * The SVE architecture leaves space for future expansion of the
> > + * vector length beyond its initial architectural limit of 2048 bits
> > + * (16 quadwords).
> > + */
> > +#define SVE_VQ_MIN		1
> > +#define SVE_VQ_MAX		0x200
> > +
> > +#define SVE_VL_MIN		(SVE_VQ_MIN * 0x10)
> > +#define SVE_VL_MAX		(SVE_VQ_MAX * 0x10)
> > +
> > +#define SVE_NUM_ZREGS		32
> > +#define SVE_NUM_PREGS		16
> > +
> > +#define sve_vl_valid(vl) \
> > +	((vl) % 0x10 == 0 && (vl) >= SVE_VL_MIN && (vl) <= SVE_VL_MAX)
> > +#define sve_vq_from_vl(vl)	((vl) / 0x10)
> > +#define sve_vl_from_vq(vq)	((vq) * 0x10)
> 
> I got a little confused first time through over what VQ and VL where.
> Maybe it would make sense to expand a little more from first principles?
> 
>   /*
>    * The SVE architecture defines vector registers as a multiple of 128
>    * bit quadwords. The current architectural limit is 2048 bits (16
>    * quadwords) but there is room for future expansion beyond that.
>     */

This comes up in several places and so I didn't want to comment it
repeatedly everywhere.

Instead, I wrote up something in section 2 (Vector length terminology)
of Documentation/arm64/sve.txt -- see patch 25.  Can you take a look and
see whether that's adequate?

[...]

Cheers
---Dave


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 09/27] arm64/sve: Signal frame and context structure definition
  2017-08-22 11:17     ` Dave Martin
@ 2017-08-22 13:53       ` Alex Bennée
  2017-08-22 14:21         ` Dave Martin
  0 siblings, 1 reply; 10+ messages in thread
From: Alex Bennée @ 2017-08-22 13:53 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arch, libc-alpha, Ard Biesheuvel, Szabolcs Nagy, gdb,
	Yao Qi, Alan Hayward, Will Deacon, Richard Sandiford,
	Catalin Marinas, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> On Tue, Aug 22, 2017 at 11:22:44AM +0100, Alex Bennée wrote:
>>
>> Dave Martin <Dave.Martin@arm.com> writes:
>>
>> > This patch defines the representation that will be used for the SVE
>> > register state in the signal frame, and implements support for
>> > saving and restoring the SVE registers around signals.
>> >
>> > The same layout will also be used for the in-kernel task state.
>> >
>> > Due to the variability of the SVE vector length, it is not possible
>> > to define a fixed C struct to describe all the registers.  Instead,
>> > Macros are defined in sigcontext.h to facilitate access to the
>> > parts of the structure.
>> >
>> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>> > ---
>> >  arch/arm64/include/uapi/asm/sigcontext.h | 113 ++++++++++++++++++++++++++++++-
>> >  1 file changed, 112 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h
>> > index f0a76b9..0533bdf 100644
>> > --- a/arch/arm64/include/uapi/asm/sigcontext.h
>> > +++ b/arch/arm64/include/uapi/asm/sigcontext.h
>> > @@ -16,6 +16,8 @@
>> >  #ifndef _UAPI__ASM_SIGCONTEXT_H
>> >  #define _UAPI__ASM_SIGCONTEXT_H
>> >
>> > +#ifndef __ASSEMBLY__
>> > +
>> >  #include <linux/types.h>
>> >
>> >  /*
>> > @@ -41,10 +43,11 @@ struct sigcontext {
>> >   *
>> >   *	0x210		fpsimd_context
>> >   *	 0x10		esr_context
>> > + *	0x8a0		sve_context (vl <= 64) (optional)
>> >   *	 0x20		extra_context (optional)
>> >   *	 0x10		terminator (null _aarch64_ctx)
>> >   *
>> > - *	0xdb0		(reserved for future allocation)
>> > + *	0x510		(reserved for future allocation)
>> >   *
>> >   * New records that can exceed this space need to be opt-in for userspace, so
>> >   * that an expanded signal frame is not generated unexpectedly.  The mechanism
>> > @@ -116,4 +119,112 @@ struct extra_context {
>> >  	__u32 __reserved[3];
>> >  };
>> >
>> > +#define SVE_MAGIC	0x53564501
>> > +
>> > +struct sve_context {
>> > +	struct _aarch64_ctx head;
>> > +	__u16 vl;
>> > +	__u16 __reserved[3];
>> > +};
>> > +
>> > +#endif /* !__ASSEMBLY__ */
>> > +
>> > +/*
>> > + * The SVE architecture leaves space for future expansion of the
>> > + * vector length beyond its initial architectural limit of 2048 bits
>> > + * (16 quadwords).
>> > + */
>> > +#define SVE_VQ_MIN		1
>> > +#define SVE_VQ_MAX		0x200
>> > +
>> > +#define SVE_VL_MIN		(SVE_VQ_MIN * 0x10)
>> > +#define SVE_VL_MAX		(SVE_VQ_MAX * 0x10)
>> > +
>> > +#define SVE_NUM_ZREGS		32
>> > +#define SVE_NUM_PREGS		16
>> > +
>> > +#define sve_vl_valid(vl) \
>> > +	((vl) % 0x10 == 0 && (vl) >= SVE_VL_MIN && (vl) <= SVE_VL_MAX)
>> > +#define sve_vq_from_vl(vl)	((vl) / 0x10)
>> > +#define sve_vl_from_vq(vq)	((vq) * 0x10)
>>
>> I got a little confused first time through over what VQ and VL where.
>> Maybe it would make sense to expand a little more from first principles?
>>
>>   /*
>>    * The SVE architecture defines vector registers as a multiple of 128
>>    * bit quadwords. The current architectural limit is 2048 bits (16
>>    * quadwords) but there is room for future expansion beyond that.
>>     */
>
> This comes up in several places and so I didn't want to comment it
> repeatedly everywhere.
>
> Instead, I wrote up something in section 2 (Vector length terminology)
> of Documentation/arm64/sve.txt -- see patch 25.  Can you take a look and
> see whether that's adequate?

Ahh, I hadn't got to that yet. I'm unsure to the order the kernel likes
to put things but I like to put design documents at the front of the
patch queue as they are useful primers and saves you having to patch a:

modified   arch/arm64/include/uapi/asm/sigcontext.h
@@ -132,19 +132,24 @@ struct sve_context {
 /*
  * The SVE architecture leaves space for future expansion of the
  * vector length beyond its initial architectural limit of 2048 bits
- * (16 quadwords).
+ * (16 quadwords). See Documentation/arm64/sve.txt for a summary of
+ * the terminology of Vector Quads (VQ) and Vector Lengths (VL).
  */
+
+#define SVE_VQ_BITS             128      /* 128 bits in one quadword */
+#define SVE_VQ_BYTES            (SVE_VQ_BITS / 8)
+
 #define SVE_VQ_MIN		1
 #define SVE_VQ_MAX		0x200

-#define SVE_VL_MIN		(SVE_VQ_MIN * 0x10)
-#define SVE_VL_MAX		(SVE_VQ_MAX * 0x10)
+#define SVE_VL_MIN		(SVE_VQ_MIN * SVE_VQ_BYTES)
+#define SVE_VL_MAX		(SVE_VQ_MAX * SVE_VQ_BYTES)

 #define SVE_NUM_ZREGS		32
 #define SVE_NUM_PREGS		16

 #define sve_vl_valid(vl) \
-	((vl) % 0x10 == 0 && (vl) >= SVE_VL_MIN && (vl) <= SVE_VL_MAX)
+	((vl) % SVE_VQ_BYTES == 0 && (vl) >= SVE_VL_MIN && (vl) <= SVE_VL_MAX)
 #define sve_vq_from_vl(vl)	((vl) / 0x10)
 #define sve_vl_from_vq(vq)	((vq) * 0x10)


>
> [...]
>
> Cheers
> ---Dave


--
Alex Bennée


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 09/27] arm64/sve: Signal frame and context structure definition
  2017-08-22 13:53       ` Alex Bennée
@ 2017-08-22 14:21         ` Dave Martin
       [not found]           ` <87shgj4pc7.fsf@linaro.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Martin @ 2017-08-22 14:21 UTC (permalink / raw)
  To: Alex Bennée
  Cc: linux-arch, libc-alpha, Ard Biesheuvel, Szabolcs Nagy, gdb,
	Yao Qi, Will Deacon, Richard Sandiford, Alan Hayward,
	Catalin Marinas, kvmarm, linux-arm-kernel

On Tue, Aug 22, 2017 at 02:53:49PM +0100, Alex Bennée wrote:
> 
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > On Tue, Aug 22, 2017 at 11:22:44AM +0100, Alex Bennée wrote:
> >>
> >> Dave Martin <Dave.Martin@arm.com> writes:

[...]

> >> > +/*
> >> > + * The SVE architecture leaves space for future expansion of the
> >> > + * vector length beyond its initial architectural limit of 2048 bits
> >> > + * (16 quadwords).
> >> > + */
> >> > +#define SVE_VQ_MIN		1
> >> > +#define SVE_VQ_MAX		0x200
> >> > +
> >> > +#define SVE_VL_MIN		(SVE_VQ_MIN * 0x10)
> >> > +#define SVE_VL_MAX		(SVE_VQ_MAX * 0x10)
> >> > +
> >> > +#define SVE_NUM_ZREGS		32
> >> > +#define SVE_NUM_PREGS		16
> >> > +
> >> > +#define sve_vl_valid(vl) \
> >> > +	((vl) % 0x10 == 0 && (vl) >= SVE_VL_MIN && (vl) <= SVE_VL_MAX)
> >> > +#define sve_vq_from_vl(vl)	((vl) / 0x10)
> >> > +#define sve_vl_from_vq(vq)	((vq) * 0x10)
> >>
> >> I got a little confused first time through over what VQ and VL where.
> >> Maybe it would make sense to expand a little more from first principles?
> >>
> >>   /*
> >>    * The SVE architecture defines vector registers as a multiple of 128
> >>    * bit quadwords. The current architectural limit is 2048 bits (16
> >>    * quadwords) but there is room for future expansion beyond that.
> >>     */
> >
> > This comes up in several places and so I didn't want to comment it
> > repeatedly everywhere.
> >
> > Instead, I wrote up something in section 2 (Vector length terminology)
> > of Documentation/arm64/sve.txt -- see patch 25.  Can you take a look and
> > see whether that's adequate?
> 
> Ahh, I hadn't got to that yet. I'm unsure to the order the kernel likes
> to put things but I like to put design documents at the front of the

I don't have a strong opinion on that -- I had preferred not to add a
document describing stuff that doesn't exist at the time of commit.
I could commit a stub document at the start of the series, and then
commit the real document later.

Either way, it seemed overkill.

Perhaps I should have drawn more attention to the documentation in the
cover letter, and encouraged reviewers to look at it first.  My
experience is that people don't often read cover letters...

Now the series is posted, I'm minded to keep the order as-is, unless you
think it's a big issue.

Adding a reference to the document seems a reasonable thing to do,
so I could add that.

> patch queue as they are useful primers and saves you having to patch a:
> 
> modified   arch/arm64/include/uapi/asm/sigcontext.h
> @@ -132,19 +132,24 @@ struct sve_context {
>  /*
>   * The SVE architecture leaves space for future expansion of the
>   * vector length beyond its initial architectural limit of 2048 bits
> - * (16 quadwords).
> + * (16 quadwords). See Documentation/arm64/sve.txt for a summary of
> + * the terminology of Vector Quads (VQ) and Vector Lengths (VL).
>   */
> +
> +#define SVE_VQ_BITS             128      /* 128 bits in one quadword */
> +#define SVE_VQ_BYTES            (SVE_VQ_BITS / 8)
> +

I was trying to keep extraneous #defines to a minimum, since this is a
uapi header, and people may depend on anything defined here.

I think SVE_VQ_BYTES is reasonable to have, and this allows us to
rewrite a few hard-coded 0x10s and 16s symbolically which is probably a
good idea -- I'll add this.

SVE_VQ_BITS looks redundant to me though.  It wouldn't be used for any
purpose other than defining SVE_VQ_BYTES.

>  #define SVE_VQ_MIN		1
>  #define SVE_VQ_MAX		0x200
> 
> -#define SVE_VL_MIN		(SVE_VQ_MIN * 0x10)
> -#define SVE_VL_MAX		(SVE_VQ_MAX * 0x10)
> +#define SVE_VL_MIN		(SVE_VQ_MIN * SVE_VQ_BYTES)
> +#define SVE_VL_MAX		(SVE_VQ_MAX * SVE_VQ_BYTES)
> 
>  #define SVE_NUM_ZREGS		32
>  #define SVE_NUM_PREGS		16
> 
>  #define sve_vl_valid(vl) \
> -	((vl) % 0x10 == 0 && (vl) >= SVE_VL_MIN && (vl) <= SVE_VL_MAX)
> +	((vl) % SVE_VQ_BYTES == 0 && (vl) >= SVE_VL_MIN && (vl) <= SVE_VL_MAX)
>  #define sve_vq_from_vl(vl)	((vl) / 0x10)
>  #define sve_vl_from_vq(vq)	((vq) * 0x10)

[...]

Cheers
---Dave


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 09/27] arm64/sve: Signal frame and context structure definition
       [not found]           ` <87shgj4pc7.fsf@linaro.org>
@ 2017-08-22 15:41             ` Dave Martin
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Martin @ 2017-08-22 15:41 UTC (permalink / raw)
  To: Alex Bennée
  Cc: linux-arch, libc-alpha, Ard Biesheuvel, Szabolcs Nagy, gdb,
	Yao Qi, Alan Hayward, Will Deacon, Richard Sandiford,
	Catalin Marinas, kvmarm, linux-arm-kernel

On Tue, Aug 22, 2017 at 04:03:20PM +0100, Alex Bennée wrote:
> 
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > On Tue, Aug 22, 2017 at 02:53:49PM +0100, Alex Bennée wrote:

[...]

> >> +
> >> +#define SVE_VQ_BITS             128      /* 128 bits in one quadword */
> >> +#define SVE_VQ_BYTES            (SVE_VQ_BITS / 8)
> >> +
> >
> > I was trying to keep extraneous #defines to a minimum, since this is a
> > uapi header, and people may depend on anything defined here.
> >
> > I think SVE_VQ_BYTES is reasonable to have, and this allows us to
> > rewrite a few hard-coded 0x10s and 16s symbolically which is probably a
> > good idea -- I'll add this.
> >
> > SVE_VQ_BITS looks redundant to me though.  It wouldn't be used for any
> > purpose other than defining SVE_VQ_BYTES.
> 
> Yeah I was more concerned with getting rid of the magic 0x10's than
> showing exactly how many bits something is.

OK, I'll take SVE_VQ_BYTES and use it in the appropriate places.
There are a few 0x10s/16s in the series that can use this instead
of being open-coded.

Cheers
---Dave


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 14/27] arm64/sve: Backend logic for setting the vector length
  2017-08-09 12:07 ` [PATCH 14/27] arm64/sve: Backend logic for setting the vector length Dave Martin
@ 2017-08-23 15:33   ` Alex Bennée
  2017-08-23 17:30     ` Dave Martin
  0 siblings, 1 reply; 10+ messages in thread
From: Alex Bennée @ 2017-08-23 15:33 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arm-kernel, linux-arch, libc-alpha, gdb, Ard Biesheuvel,
	Szabolcs Nagy, Catalin Marinas, Yao Qi, Alan Hayward,
	Will Deacon, Richard Sandiford, kvmarm


Dave Martin <Dave.Martin@arm.com> writes:

> This patch implements the core logic for changing a task's vector
> length on request from userspace.  This will be used by the ptrace
> and prctl frontends that are implemented in later patches.
>
> The SVE architecture permits, but does not require, implementations
> to support vector lengths that are not a power of two.  To handle
> this, logic is added to check a requested vector length against a
> possibly sparse bitmap of available vector lengths at runtime, so
> that the best supported value can be chosen.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> ---
>  arch/arm64/include/asm/fpsimd.h |   6 +++
>  arch/arm64/kernel/fpsimd.c      | 116 ++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/prctl.h      |   5 ++
>  3 files changed, 127 insertions(+)
>
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 7efd04e..39b26d2 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -70,11 +70,15 @@ extern void fpsimd_update_current_state(struct fpsimd_state *state);
>
>  extern void fpsimd_flush_task_state(struct task_struct *target);
>
> +#define SVE_VL_ARCH_MAX 0x100
> +

Hmm this isn't the same as SVE_VL_MAX. Why aren't we using that?

>  extern void sve_save_state(void *state, u32 *pfpsr);
>  extern void sve_load_state(void const *state, u32 const *pfpsr,
>  			   unsigned long vq_minus_1);
>  extern unsigned int sve_get_vl(void);
>
> +extern int sve_max_vl;
> +
>  #ifdef CONFIG_ARM64_SVE
>
>  extern size_t sve_state_size(struct task_struct const *task);
> @@ -83,6 +87,8 @@ extern void sve_alloc(struct task_struct *task);
>  extern void fpsimd_release_thread(struct task_struct *task);
>  extern void fpsimd_dup_sve(struct task_struct *dst,
>  			   struct task_struct const *src);
> +extern int sve_set_vector_length(struct task_struct *task,
> +				 unsigned long vl, unsigned long flags);
>
>  #else /* ! CONFIG_ARM64_SVE */
>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index e8674f6..bce95de 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -18,12 +18,14 @@
>   */
>
>  #include <linux/bottom_half.h>
> +#include <linux/bitmap.h>
>  #include <linux/cpu.h>
>  #include <linux/cpu_pm.h>
>  #include <linux/kernel.h>
>  #include <linux/init.h>
>  #include <linux/percpu.h>
>  #include <linux/preempt.h>
> +#include <linux/prctl.h>
>  #include <linux/ptrace.h>
>  #include <linux/sched/signal.h>
>  #include <linux/signal.h>
> @@ -111,6 +113,20 @@ static DEFINE_PER_CPU(struct fpsimd_state *, fpsimd_last_state);
>  /* Default VL for tasks that don't set it explicitly: */
>  static int sve_default_vl = -1;
>
> +#ifdef CONFIG_ARM64_SVE
> +
> +/* Maximum supported vector length across all CPUs (initially poisoned) */
> +int sve_max_vl = -1;
> +/* Set of available vector lengths, as vq_to_bit(vq): */
> +static DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
> +
> +#else /* ! CONFIG_ARM64_SVE */
> +
> +/* Dummy declaration for code that will be optimised out: */
> +extern DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
> +
> +#endif /* ! CONFIG_ARM64_SVE */
> +
>  static void sve_free(struct task_struct *task)
>  {
>  	kfree(task->thread.sve_state);
> @@ -148,6 +164,37 @@ static void change_cpacr(u64 old, u64 new)
>  		write_sysreg(new, CPACR_EL1);
>  }
>
> +static unsigned int vq_to_bit(unsigned int vq)
> +{
> +	BUILD_BUG_ON(vq < 1 || vq > SVE_VQ_MAX);
> +
> +	return SVE_VQ_MAX - vq;
> +}
> +
> +static unsigned int bit_to_vq(unsigned int bit)
> +{
> +	BUILD_BUG_ON(bit >= SVE_VQ_MAX);
> +
> +	return SVE_VQ_MAX - bit;
> +}
> +
> +static unsigned int find_supported_vector_length(unsigned int vl)
> +{
> +	int bit;
> +
> +	BUG_ON(!sve_vl_valid(vl));
> +
> +	BUG_ON(!sve_vl_valid(sve_max_vl));
> +	if (vl > sve_max_vl)
> +		vl = sve_max_vl;
> +
> +	bit = find_next_bit(sve_vq_map, SVE_VQ_MAX,
> +			    vq_to_bit(sve_vq_from_vl(vl)));
> +	BUG_ON(bit < 0 || bit >= SVE_VQ_MAX);
> +
> +	return 16 * bit_to_vq(bit);
> +}
> +
>  #define ZREG(sve_state, vq, n) ((char *)(sve_state) +		\
>  	(SVE_SIG_ZREG_OFFSET(vq, n) - SVE_SIG_REGS_OFFSET))
>
> @@ -235,6 +282,73 @@ void fpsimd_dup_sve(struct task_struct *dst, struct task_struct const *src)
>  	}
>  }
>
> +int sve_set_vector_length(struct task_struct *task,
> +			  unsigned long vl, unsigned long flags)
> +{
> +	BUG_ON(task == current && preemptible());
> +
> +	if (flags & ~(unsigned long)(PR_SVE_VL_INHERIT |
> +				     PR_SVE_SET_VL_ONEXEC))
> +		return -EINVAL;
> +
> +	if (!sve_vl_valid(vl))
> +		return -EINVAL;
> +
> +	/*
> +	 * Clamp to the maximum vector length that VL-agnostic SVE code can
> +	 * work with.  A flag may be assigned in the future to allow setting
> +	 * of larger vector lengths without confusing older software.
> +	 */
> +	if (vl > SVE_VL_ARCH_MAX)
> +		vl = SVE_VL_ARCH_MAX;
> +
> +	vl = find_supported_vector_length(vl);
> +
> +	if (flags & (PR_SVE_VL_INHERIT |
> +		     PR_SVE_SET_VL_ONEXEC))
> +		task->thread.sve_vl_onexec = vl;
> +	else
> +		/* Reset VL to system default on next exec: */
> +		task->thread.sve_vl_onexec = 0;
> +
> +	/* Only actually set the VL if not deferred: */
> +	if (flags & PR_SVE_SET_VL_ONEXEC)
> +		goto out;
> +
> +	/*
> +	 * To ensure the FPSIMD bits of the SVE vector registers are preserved,
> +	 * write any live register state back to task_struct, and convert to a
> +	 * non-SVE thread.
> +	 */
> +	if (vl != task->thread.sve_vl) {
> +		if (task == current) {
> +			task_fpsimd_save();
> +			set_thread_flag(TIF_FOREIGN_FPSTATE);
> +		}
> +
> +		if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
> +			sve_to_fpsimd(task);
> +
> +		/*
> +		 * Force reallocation of task SVE state to the correct size
> +		 * on next use:
> +		 */
> +		sve_free(task);
> +	}
> +
> +	task->thread.sve_vl = vl;
> +
> +	fpsimd_flush_task_state(task);
> +
> +out:
> +	if (flags & PR_SVE_VL_INHERIT)
> +		set_thread_flag(TIF_SVE_VL_INHERIT);
> +	else
> +		clear_thread_flag(TIF_SVE_VL_INHERIT);
> +
> +	return 0;
> +}
> +
>  void fpsimd_release_thread(struct task_struct *dead_task)
>  {
>  	sve_free(dead_task);
> @@ -407,6 +521,8 @@ void fpsimd_flush_thread(void)
>  		 * If not, something went badly wrong.
>  		 */
>  		BUG_ON(!sve_vl_valid(current->thread.sve_vl));
> +		BUG_ON(find_supported_vector_length(current->thread.sve_vl) !=
> +		       current->thread.sve_vl);
>
>  		/*
>  		 * If the task is not set to inherit, ensure that the vector
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index a8d0759..1b64901 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -197,4 +197,9 @@ struct prctl_mm_map {
>  # define PR_CAP_AMBIENT_LOWER		3
>  # define PR_CAP_AMBIENT_CLEAR_ALL	4
>
> +/* arm64 Scalable Vector Extension controls */
> +# define PR_SVE_SET_VL_ONEXEC		(1 << 18) /* defer effect until exec */
> +# define PR_SVE_VL_LEN_MASK		0xffff
> +# define PR_SVE_VL_INHERIT		(1 << 17) /* inherit across exec */
> +
>  #endif /* _LINUX_PRCTL_H */


--
Alex Bennée


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 14/27] arm64/sve: Backend logic for setting the vector length
  2017-08-23 15:33   ` Alex Bennée
@ 2017-08-23 17:30     ` Dave Martin
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Martin @ 2017-08-23 17:30 UTC (permalink / raw)
  To: Alex Bennée
  Cc: linux-arch, libc-alpha, Ard Biesheuvel, Szabolcs Nagy, gdb,
	Yao Qi, Alan Hayward, Will Deacon, Richard Sandiford,
	Catalin Marinas, kvmarm, linux-arm-kernel

On Wed, Aug 23, 2017 at 04:33:18PM +0100, Alex Bennée wrote:
> 
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > This patch implements the core logic for changing a task's vector
> > length on request from userspace.  This will be used by the ptrace
> > and prctl frontends that are implemented in later patches.
> >
> > The SVE architecture permits, but does not require, implementations
> > to support vector lengths that are not a power of two.  To handle
> > this, logic is added to check a requested vector length against a
> > possibly sparse bitmap of available vector lengths at runtime, so
> > that the best supported value can be chosen.
> >
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > ---
> >  arch/arm64/include/asm/fpsimd.h |   6 +++
> >  arch/arm64/kernel/fpsimd.c      | 116 ++++++++++++++++++++++++++++++++++++++++
> >  include/uapi/linux/prctl.h      |   5 ++
> >  3 files changed, 127 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> > index 7efd04e..39b26d2 100644
> > --- a/arch/arm64/include/asm/fpsimd.h
> > +++ b/arch/arm64/include/asm/fpsimd.h
> > @@ -70,11 +70,15 @@ extern void fpsimd_update_current_state(struct fpsimd_state *state);
> >
> >  extern void fpsimd_flush_task_state(struct task_struct *target);
> >
> > +#define SVE_VL_ARCH_MAX 0x100
> > +
> 
> Hmm this isn't the same as SVE_VL_MAX. Why aren't we using that?

There should probably be a comment on this.

SVE vector-length agnostic software needs to be able to assume that
VL <= 256 (the SVE maximum), because some instructions won't work as
expected if this is not the case, particularly TBL, where in a larger
vector there's no guarantee that a vector index (which might be as small
as a byte) can index a vector element (of which there could be >256).

I really don't want to expose SVE_VL_ARCH_MAX to userspace, so that
people are not tempted to design data structures and protocols that
can't handle up to SVE_VL_MAX.  Exposing both together seemed to carry
a high risk of confusion and misuse.

Instead, I silently clamp to an SVE-compatible length (i.e.,
SVE_VL_ARCH_MAX) in sve_set_vector_length().  If future architecture
revisions support larger vectors later on, my plan is to require an
extra "force flag" to be passed to override the clamp.

[...]

Cheers
---Dave


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-08-23 17:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1502280338-23002-1-git-send-email-Dave.Martin@arm.com>
2017-08-09 12:06 ` [PATCH 09/27] arm64/sve: Signal frame and context structure definition Dave Martin
2017-08-22 10:22   ` Alex Bennée
2017-08-22 11:17     ` Dave Martin
2017-08-22 13:53       ` Alex Bennée
2017-08-22 14:21         ` Dave Martin
     [not found]           ` <87shgj4pc7.fsf@linaro.org>
2017-08-22 15:41             ` Dave Martin
2017-08-09 12:07 ` [PATCH 14/27] arm64/sve: Backend logic for setting the vector length Dave Martin
2017-08-23 15:33   ` Alex Bennée
2017-08-23 17:30     ` Dave Martin
2017-08-09 12:08 ` [PATCH 18/27] arm64/sve: ptrace and ELF coredump support Dave Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox