From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-154496-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 33209 invoked by alias); 8 Mar 2019 02:55:38 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 33124 invoked by uid 89); 8 Mar 2019 02:55:37 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.1 spammy=HX-Languages-Length:6603, familiar
X-HELO: simark.ca
Received: from simark.ca (HELO simark.ca) (158.69.221.121) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 08 Mar 2019 02:55:34 +0000
Received: by simark.ca (Postfix, from userid 112)	id 04F9F1E658; Thu,  7 Mar 2019 21:55:33 -0500 (EST)
Received: from simark.ca (localhost [127.0.0.1])	by simark.ca (Postfix) with ESMTP id 97FE41E152;	Thu,  7 Mar 2019 21:55:30 -0500 (EST)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
Content-Transfer-Encoding: 7bit
Date: Fri, 08 Mar 2019 02:55:00 -0000
From: Simon Marchi <simark@simark.ca>
To: John Baldwin <jhb@freebsd.org>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCH v2 04/11] Add a new gdbarch method to resolve the address of TLS variables.
In-Reply-To: <2c282f52-0269-d6a8-8533-4c00b1a4ee8d@FreeBSD.org>
References: <cover.1549672588.git.jhb@FreeBSD.org> <4db33aead3f31532b7d4e165d9786df792a4d925.1549672588.git.jhb@FreeBSD.org> <02c8a44b-b1d2-0f0f-9b6f-72a0fb673f83@simark.ca> <2c282f52-0269-d6a8-8533-4c00b1a4ee8d@FreeBSD.org>
Message-ID: <4a8bfa95b84386b3a76a37113495b7bc@simark.ca>
X-Sender: simark@simark.ca
User-Agent: Roundcube Webmail/1.3.6
X-SW-Source: 2019-03/txt/msg00184.txt.bz2

On 2019-03-07 18:50, John Baldwin wrote:
> On 3/7/19 8:08 AM, Simon Marchi wrote:
>> On 2019-02-08 7:40 p.m., John Baldwin wrote:
>>> Permit TLS variable addresses to be resolved purely by an ABI rather
>>> than requiring a target method.  This doesn't try the target method 
>>> if
>>> the ABI function is present (even if the ABI function fails) to
>>> simplify error handling.
>> 
>> I don't see anything wrong with the patch (and the comment you removed
>> in target_translate_tls_address hints it is right), but again I am not
>> very familiar with how TLS works, so I wouldn't spot if anything was
>> conceptually wrong with this approach.  I would appreciate if another
>> maintainer could take a look and give their opinion.
> 
> Ok.  FWIW, the reason for target vs gdbarch has to do with the
> different ways one can
> resolve a TLS variable.  Some background:
> 
> In ELF relocations, a TLS variable is identified by an offset in a
> special TLS section
> of an ELF file, similar to global symbols being an offset relative to
> .data or .bss.
> However, TLS variables are duplicated for each thread.  To support
> this, the runtime
> linker allocates an array of pointers for each thread called the DTV
> array.  The runtime
> linker also assigns an array index to each ELF object, so the
> executable is assigned array
> index 1, and other shared libraries that use TLS are assigned indices
> as they are mapped
> by the runtime linker.  The pointers in each thread's array point to
> the per-thread blocks
> of TLS variables for a given ELF object.  Thus, if index 1 is for my
> program and index 2
> was assigned to libc, then DTV[1] contains a pointer to all of the TLS
> variables in my
> main program and DTV[2] contains a pointer to all of the TLS variables 
> in libc.
> 
> Thus, if libc has two TLS integers 'foo' and 'bar', they might be
> assigned offsets of
> 0 and 4.  To read the value of 'foo' one uses the expression '*(int
> *)(DTV[1])'.  To
> read 'bar' you would use '*(int *)(DTV[1] + 4)'.
> 
> There are some extra optimizations in the compiler-generated code
> (there's something
> called static TLS that can be at a fixed offset from the per-thread
> TCB pointer IIRC,
> but there are also valid DTV[] pointers that can get to the same
> variables just via
> more indirection.  Compiled code is also allowed to fetch the 'base'
> of a TLS block
> for a given shared object and then save that 'base' and use offsets
> from it to access
> different variables.  Put another way, the compiler can assume that
> &bar - &foo is
> always '4' and just add the relative offset to '&foo' to compute
> '&bar' without going
> through the DTV array every time).
> 
> In target_translate_tls_address() we are given the 'struct objfile' of
> the ELF object
> and the offset of the TLS variable we are trying to find.  The
> gdbarch_fetch_tls_load_module_address function fetches the pointer to
> the runtime
> linker's data structure describing that ELF object.
> 
> The target version (target::get_thread_local_address) expects to use 
> some target
> specific method to turn a (thread, linker_module_addr, offset) tuple 
> into the
> address of a TLS variable.  On Linux and other systems using
> libthread_db for this,
> it calls a libthread_db function.  Internally that libthread_db
> function looks at
> the runtime linker's structure to extract the TLS index of the ELF 
> object.  It
> then looks in the thread library's per-thread data structure to find a 
> pointer
> to the DTV array.  It then uses 'DTV[index] + offset' to compute the
> final address.
> Note that this is all done in libthread_db rather than in gdb itself.
> 
> The gdbarch method I'm using for FreeBSD doesn't use libthread_db.  
> Instead, it
> more closely mimics what the compiler-generated code does.  Many
> architectures use
> some sort of register to point to a per-thread Thread Control Block
> (TCB), and they
> store a pointer to the DTV array either in the TCB or at a fixed
> offset relative to
> the TCB.  For example, 64-bit x86 uses the %fs segment prefix to access 
> the TCB,
> and the %fs_base register is thus a pointer to the TCB.  32-bit x86 
> uses %gs and
> %gs_base instead.  RISC-V has a 'tp' register for this purpose, etc.
> The approach
> I'm using for FreeBSD is to provide an architecture-specific function
> that uses the
> relevant register to locate the pointer to the DTV array.  It then
> calls a shared
> function (patch 7) that extracts the TLS index from the runtime 
> linker's data
> structure and computes the final address via 'DTV[index] + offset'.
> 
> Mostly I did this because I don't like libthread_db, but using a 
> gdbarch method
> should also be a bit more cross-debugger friendly (you don't have to
> have a libthread_db
> on a FreeBSD host that understands the Linux runtime linker or thread 
> library or
> vice versa, and similar concerns with 32-bit vs 64-bit and x86 vs ARM, 
> etc.).

Ok, thanks to your explanation I think I understand better the need to 
have an arch-specific way of doing it.

>>> diff --git a/gdb/gdbarch.sh b/gdb/gdbarch.sh
>>> index afc4da7cdd..09097bcbaf 100755
>>> --- a/gdb/gdbarch.sh
>>> +++ b/gdb/gdbarch.sh
>>> @@ -602,6 +602,7 @@ m;int;remote_register_number;int 
>>> regno;regno;;default_remote_register_number;;0
>>> 
>>>   # Fetch the target specific address used to represent a load 
>>> module.
>>>   F;CORE_ADDR;fetch_tls_load_module_address;struct objfile 
>>> *objfile;objfile
>>> +M;CORE_ADDR;get_thread_local_address;ptid_t ptid, CORE_ADDR lm_addr, 
>>> CORE_ADDR offset;ptid, lm_addr, offset
>> 
>> Could you document the method, especially the meaning of the 
>> parameters?
> 
> Sure.  I used a variant of the comment from the target method:
> 
> diff --git a/gdb/gdbarch.sh b/gdb/gdbarch.sh
> index 48fcebd19a..d15b6aa794 100755
> --- a/gdb/gdbarch.sh
> +++ b/gdb/gdbarch.sh
> @@ -602,6 +602,14 @@ m;int;remote_register_number;int
> regno;regno;;default_remote_register_number;;0
> 
>  # Fetch the target specific address used to represent a load module.
>  F;CORE_ADDR;fetch_tls_load_module_address;struct objfile 
> *objfile;objfile
> +
> +# Return the thread-local address at OFFSET in the thread-local
> +# storage for the thread PTID and the shared library or executable
> +# file given by LM_ADDR.  If that block of thread-local storage hasn't
> +# been allocated yet, this function may return an error.  LM_ADDR may
> +# be zero for statically linked multithreaded inferiors.

What does "may return an error" mean?  A special CORE_ADDR value, or it 
throws an error?

Simon