From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 55406 invoked by alias); 24 May 2019 20:12:26 -0000 Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org Received: (qmail 55327 invoked by uid 89); 24 May 2019 20:12:25 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-4.9 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,TBC autolearn=no version=3.3.1 spammy=reveal, H*i:sk:c30cff0, H*f:sk:c30cff0, DTV X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 24 May 2019 20:12:20 +0000 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5755430832CD; Fri, 24 May 2019 20:12:19 +0000 (UTC) Received: from oldenburg2.str.redhat.com (ovpn-116-18.ams2.redhat.com [10.36.116.18]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1876A68B0A; Fri, 24 May 2019 20:12:14 +0000 (UTC) From: Florian Weimer To: Carlos O'Donell Cc: Simon Marchi , gdb@sourceware.org Subject: Re: Getting offset of inital-exec TLS variables on GNU/Linux References: <87o941pej8.fsf@oldenburg2.str.redhat.com> <19f008ac-ffb7-5e15-ae6c-5fc00791c964@simark.ca> <87a7flp2tm.fsf@oldenburg2.str.redhat.com> <874l5jq0rm.fsf@oldenburg2.str.redhat.com> Date: Sat, 25 May 2019 00:45:00 -0000 In-Reply-To: (Carlos O'Donell's message of "Fri, 24 May 2019 15:32:31 -0400") Message-ID: <875zpzobmq.fsf@oldenburg2.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2019-05/txt/msg00048.txt.bz2 * Carlos O'Donell: >> Going back a bit, I'm not sure what the API contract is for >> DW_OP_GNU_push_tls_address. It's not really clear to me if under the >> ELF TLS ABI, there is an expectation that the dynamic linker always >> allocates the TLS space for a DSO as a single block. I've perused the >> two documents for the GNU ELF TLS ABI, and this is never spelled out >> explicitly. > > You are correct, it is not spelled out that the space for the loaded > module needs to be in a single contiguous region of memory, but it is > implied by the design of PT_TLS. > > In practice we have no gaps because the initialization image for the bloc= k, > particularly for thread-local initialized global data, is copied as one > continuous block. If we wanted to support gaps we would need a more compl= ex > definition than just the PT_TLS marker we use to identify that region (and > which sections go into that region during static link). > > Does this answer your question about why it needs to be a single > continuous block? To some extent. Global TLS symbols with a dynamic symbol entry do have size information attached, so we could just allocate that much and copy the initialization from the PT_TLS segment. It's what I would have expected to happen. > Which two documents did you review? >=20=20 >> Outside debugging information, a TLS relocation for non-initial-exec TLS >> always consists of a pair of a module ID and an offset. Therefore, it >> should be possible to lazily allocate individual TLS variables within a >> DSO, by assigning them separate module IDs. > > You should be able to do that, but for each variable you must have > access to the size via symtab st_size to be able to copy the > potentially initialized value from PT_TLS, otherwise the block must be > the full size and you must initialize it as if it were all the data > for the DSO. Right. For local symbols, we do not necessarily have this size information. >> But what seems to happen in practice is that there is just one TLS block >> per DSO, which is allocated at once once the first TLS variable is >> accessed. Furthermore, the entire TLS block is allocated non-lazily if >> there is a single initial-exec TLS variable in a DSO.=20=20 > > Just to be clear there is one TLS block per DSO per thread, which is > allocated once. > > Yes, the entire TLS block is allocated non-lazily, and it's > initializing image is copied from PT_TLS for that module. > > Once you touch any TLS variable for the DSO the whole DSO is > initialized for use. > > Yes, any IE TLS vars force the allocation of the whole block because > there could be IE TLS var uses immediately after startup relocation > processing of the TPOFF GOT relocations. The question is if this is an absolute requirement. Alex's document alludes to a different possibility: =E2=80=9C The use of TLS descriptors to access thread-local variables would enable the compression of the DTV such that it contained only entries for non-static modules. Static ones could be given negative ids, such that legacy relocations and direct calls to __tls_get_addr() could still work correctly, but entries could be omitted from the DTV, and the DTV entries would no longer need the boolean currently used to denote entries that are in Static TLS. =E2=80=9D Maybe that could enable separate allocation of initial-exec and other TLS models, too. Then DF_STATIC_TLS as an indicator for the libthread_db bypass would no longer work. >> Based on that, I conclude that the module IDs are used only to share >> TLS variables for the same symbol across multiple modules. > > I do not think this conclusion is accurate. > > What you are observing is a consequence of the fact that the static > linker, during construction of the binary, optmizes all DSOs seen at > static link time and places them into the initial TLS block. And as > many of the references are optimized into TLS IE, which have no module > ID at all (they don't need it). > > The module ID is intended to reference the module or DSO, to allow the > memory for that module be loaded lazily. Maybe I phrased my conclusion poorly. From an algorithmic point of view, outside the debugging information, I do not think there is an intrinsic reason why a single DSO could have just one TLS module ID. >> Due to this restriction, the module ID for a TLS variable can be >> inferred from the object that contains the DW_OP_GNU_push_tls_address >> opcode (and hopefully locating the object based on symbol name in the >> debugger matches the way dynamic linker search symbols). > > I don't understand what you mean by "restriction" here? > > The module ID for a TLS variable is assigned at runtime, and any > inspection process with a live process could find the module ID for > the module by looking for DTPMOD relocations, and reading their values > out of the GOT for the DSO. I don't think you can look at the relocation. The thread-local variable can be in scope in a compilation unit, but there might not be any reference to it, so there is no relocation that would reveal its location. > In practice it's one mod id per DSO (but as you note above it need not > be, but is restricted by PT_TLS design). So yes, if you see a DSO and > it has a DW_OP_GNU_push_tls_address, you can determine it's module ID > by inspecting the DSOs GOT given dynamic relocation information, and > once you have the module ID you can call __tls_get_addr with the > symbol offset to get the final address (or find the dtv and traverse > it for the thread). > > I don't actually understand how DW_OP_GNU_push_tls_address works, but > the comments for it seem to indicate it's a hack that glibc is > supposed to fix, but I've never been asked about it :-) > > It *looks* like DW_OP_GNU_push_tls_address is just the offset from the > start of the block, but that's still not enough to compute the final > address of the variable in memory. My concern is that the interface, in theory, would allow very different address translations through libthread_db, similar to how we use the DSO-internal offset as a hash table key in _dl_make_tlsdesc_dynamic. I'm not sure the staged computation (first the TLS base address, then the combination with the offset) completely prevents that. Teaching GDB how this works today would thus constrain future evolution of the internal library design. But if you say that we can perform lazy allocation only en bloc, once per DSO, then that doesn't matter. >> Asumming this is indeed true, we could add the >> TLS offset of a DSO to the public part of the link map in the glibc >> dynamic loader to help debuggers.=20=20 > > I wouldn't call it the TLS offset of a DSO, instead I would call it > what it is "static TLS offset", and if DF_STATIC_TLS is set for that > map, then you know you can find all the TLS variables without libthread_d= b.=20 > >> For TLS variables defined in >> DF_STATIC_TLS DSOs, it should then be possible to access the TLS >> variable without the help of libthread_db, assuming that we teach GDB >> how to compute the TLS variable address from the thread pointer, DSO TLS >> offset, and variable offset (something binutils seems to know for >> several targets already, to implement relaxations). > > Right, you would need: > > TP + DSO offset + VAR offset =3D final symbol address. > > The VAR offset is actually the DW_OP_const operations prior to the > DW_OP_form_tls_address call. > > TP + DSO offset is 'td_thr_tlsbase' from nptl_db, so you would be exposing > only the DSO offset value in the link map *only* for the DF_STATIC_TLS ca= se, > because it's a fixed value that won't change. > > Note that 'td_thr_tls_get_addr' from nptl_db does the above calculation. > > We'd be teaching gdb how to manually run 'td_thr_tlsbase' for the limited > case of DF_STATIC_TLS. It would solve some of the problems we have, and > remove any hacks for errno, but would not fully solve dynamic TLS variable > access in a single threaded program (though it's a big step forward). If we fix the startup problem with dlopen and initial-exec TLS in glibc (changes that do not affect ABI at all), more people can use initial-exec TLS and benefit from the GDB enhancement. And yes, while the general problem of TCB placement for future initial-exec TLS allocations is unsolvable, we can be much smarter about what we do than today, if we separate TCB allocation from stack allocation. Thanks, Florian