From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id E017ERpcrV8aUAAAWB0awg (envelope-from ) for ; Thu, 12 Nov 2020 11:00:26 -0500 Received: by simark.ca (Postfix, from userid 112) id 42EC31F08B; Thu, 12 Nov 2020 11:00:26 -0500 (EST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on simark.ca X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,RDNS_NONE,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.2 Received: from sourceware.org (unknown [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id 8D7BB1E58E for ; Thu, 12 Nov 2020 11:00:25 -0500 (EST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B91B1386F02A; Thu, 12 Nov 2020 16:00:24 +0000 (GMT) Received: from mail-wr1-x443.google.com (mail-wr1-x443.google.com [IPv6:2a00:1450:4864:20::443]) by sourceware.org (Postfix) with ESMTPS id 23449386F001 for ; Thu, 12 Nov 2020 16:00:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 23449386F001 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=andrew.burgess@embecosm.com Received: by mail-wr1-x443.google.com with SMTP id p1so6500730wrf.12 for ; Thu, 12 Nov 2020 08:00:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=n8+HjeT2dPIuJoTMthLxrKCcWwWWEiE7XTl/r+E/LCE=; b=CzJN2x5MrydodfL/wNVRlOTrPSkmV1gERPvHcuNTC0PAuU7ZCymgvszPw8jKN6KGHZ wzIXnJmNJkdfmCEKf+IwGnWInNJEKJ1vGx+leidPQeCQP1LMfAffQ/xu5buV5wnsuAco pKCqZdX6x7nfLylixRHgtb2cadxhrrAL0lHKWpZAwCT7h0Vz2kjqsoHJwpgU90ttMpUA 8B5aIPp0zuQGH6BVqxM0ofuDlWoPKr05hwtSnQy+Sr+e0ZBx4Q/h6dpaKqeuGLeeItfO De+9Mpbo/pLfEHiqvQzXIvRTYy9IG/M5v8CyWG0PgxIoc3f4TBiOi7/lxT7LFvKiE9l5 v7Ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=n8+HjeT2dPIuJoTMthLxrKCcWwWWEiE7XTl/r+E/LCE=; b=koIBdUkdkdzspO1CvZGVO5hplhIW1iDoug0EHQpCNWJbCnyXr/sjJQU2rH3341If/y fU1reEIor8tcWpPFMphQmq2evHnad8vAG5iPgDGmitRczVaWomqsLkfhunqtCEMOgXWm LftMAs9gtJNR/6v85TOHmL47SkZDIDtIw7DIASpVVcMXiLHu7/gQ134T2PjbqK1gn0Rs AhIPtSoiwnMbfPBX/0bD8NX9Ykev/U/E4AcN8raggQKorH0ekzBzmAMhSyD4xvbEqHG3 Pf69SIrM5XsS03wzY1Tln30haKaxLcuDbk56QkTNKOCxPbvFXJ1/xaDFlN3V+SrlhEQv af6A== X-Gm-Message-State: AOAM533d3adeLsF1RhVP0kvcWwAjFU7Dss3ZhCZ+HRP0+Cf+vocLzZPH eC+TbfV6FVY/Bl2o/9gZlhvwYkjKzxx04g== X-Google-Smtp-Source: ABdhPJyN6scu8J2hL1Ce9Z7pjUbBgPHpZ7g+890QP5Tux/gnS+b2ArtW7ZyP/3lw46M6alc/akpK8w== X-Received: by 2002:adf:9407:: with SMTP id 7mr272492wrq.182.1605196820122; Thu, 12 Nov 2020 08:00:20 -0800 (PST) Received: from localhost (host212-140-123-128.range212-140.btcentralplus.com. [212.140.123.128]) by smtp.gmail.com with ESMTPSA id l3sm7841429wmf.0.2020.11.12.08.00.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Nov 2020 08:00:19 -0800 (PST) Date: Thu, 12 Nov 2020 16:00:18 +0000 From: Andrew Burgess To: Joel Brobecker Subject: Re: [PATCH] gdb: user variables with components of dynamic type Message-ID: <20201112160018.GR2729@embecosm.com> References: <20201022153238.1947197-1-andrew.burgess@embecosm.com> <20201106230422.GK2729@embecosm.com> <20201108105059.GC451505@adacore.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201108105059.GC451505@adacore.com> X-Operating-System: Linux/5.8.13-100.fc31.x86_64 (x86_64) X-Uptime: 15:59:29 up 18 days, 7:02, X-Editor: GNU Emacs [ http://www.gnu.org/software/emacs ] X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: gdb-patches@sourceware.org Errors-To: gdb-patches-bounces@sourceware.org Sender: "Gdb-patches" Joel, Thanks for your feedback. > * Joel Brobecker [2020-11-08 14:50:59 +0400]: > > Hi Andrew, > > > * Andrew Burgess [2020-10-22 16:32:38 +0100]: > > > > > Consider this Fortran type: > > > > > > type :: some_type > > > integer, allocatable :: array_one (:,:) > > > integer :: a_field > > > integer, allocatable :: array_two (:,:) > > > end type some_type > > > > > > And a variable declared: > > > > > > type(some_type) :: some_var > > > > > > Now within GDB we try this: > > > > > > (gdb) set $a = some_var > > > (gdb) p $a > > > $1 = ( array_one = > > > ../../src/gdb/value.c:3968: internal-error: Unexpected lazy value type. > > > > > > Normally, when an internalvar ($a in this case) is created, it is > > > non-lazy, the value is immediately copied out of the inferior into > > > GDB's memory. > > > > > > When printing the internalvar ($a) GDB will extract each field in > > > turn, so in this case `array_one`. As the original internalvar is > > > non-lazy then the extracted field will also be non-lazy, with its > > > contents immediately copied from the parent internalvar. > > > > > > However, when the field has a dynamic type this is not the case, > > > value_primitive_field we see that any field with dynamic type is > > > always created lazy. Further, the content of this field will usually > > > not have been captured in the contents buffer of the original value, a > > > field with dynamic location is effectively a pointer value contained > > > within the parent value, with rules in the DWARF for how to > > > dereference the pointer. > > Is it a pointer, or a reference? From what you are seeing and > what you are reported here, I assume these components are declared > as references? Or perhaps, after written 3 different versions of > a reply to this email, they are actually *neither*, but rather > are described as arrays with location expressions? If we just look at 'some_var%array_one', here's it's type information: <1><3c>: Abbrev Number: 5 (DW_TAG_structure_type) <3d> DW_AT_name : (indirect string, offset: 0x0): some_type <41> DW_AT_byte_size : 184 <42> DW_AT_decl_file : 1 <43> DW_AT_decl_line : 16 <44> DW_AT_sibling : <0x6d> <2><48>: Abbrev Number: 6 (DW_TAG_member) <49> DW_AT_name : (indirect string, offset: 0x5f): array_one <4d> DW_AT_decl_file : 1 <4e> DW_AT_decl_line : 18 <4f> DW_AT_type : <0x6d> <53> DW_AT_data_member_location: 0 <2><54>: Abbrev Number: 6 (DW_TAG_member) <55> DW_AT_name : (indirect string, offset: 0x79): a_field <59> DW_AT_decl_file : 1 <5a> DW_AT_decl_line : 19 <5b> DW_AT_type : <0xaa> <5f> DW_AT_data_member_location: 88 <2><60>: Abbrev Number: 6 (DW_TAG_member) <61> DW_AT_name : (indirect string, offset: 0x4c): array_two <65> DW_AT_decl_file : 1 <66> DW_AT_decl_line : 20 <67> DW_AT_type : <0xb6> <6b> DW_AT_data_member_location: 96 <2><6c>: Abbrev Number: 0 <1><6d>: Abbrev Number: 7 (DW_TAG_array_type) <6e> DW_AT_ordering : 1 (column major) <6f> DW_AT_data_location: 2 byte block: 97 6 (DW_OP_push_object_address; DW_OP_deref) <72> DW_AT_allocated : 4 byte block: 97 6 30 2e (DW_OP_push_object_address; DW_OP_deref; DW_OP_lit0; DW_OP_ne) <77> DW_AT_type : <0xaa> [ APB: This is signed 4-byte integer. ] <7b> DW_AT_sibling : <0xaa> [ APB: This is signed 4-byte integer. ] <2><7f>: Abbrev Number: 8 (DW_TAG_subrange_type) <80> DW_AT_lower_bound : 4 byte block: 97 23 30 6 (DW_OP_push_object_address; DW_OP_plus_uconst: 48; DW_OP_deref) <85> DW_AT_upper_bound : 4 byte block: 97 23 38 6 (DW_OP_push_object_address; DW_OP_plus_uconst: 56; DW_OP_deref) <8a> DW_AT_byte_stride : 9 byte block: 97 23 28 6 97 23 20 6 1e (DW_OP_push_object_address; DW_OP_plus_uconst: 40; DW_OP_deref; DW_OP_push_object_address; DW_OP_plus_uconst: 32; DW_OP_deref; DW_OP_mul) <2><94>: Abbrev Number: 8 (DW_TAG_subrange_type) <95> DW_AT_lower_bound : 4 byte block: 97 23 48 6 (DW_OP_push_object_address; DW_OP_plus_uconst: 72; DW_OP_deref) <9a> DW_AT_upper_bound : 4 byte block: 97 23 50 6 (DW_OP_push_object_address; DW_OP_plus_uconst: 80; DW_OP_deref) <9f> DW_AT_byte_stride : 9 byte block: 97 23 40 6 97 23 20 6 1e (DW_OP_push_object_address; DW_OP_plus_uconst: 64; DW_OP_deref; DW_OP_push_object_address; DW_OP_plus_uconst: 32; DW_OP_deref; DW_OP_mul) <2>: Abbrev Number: 0 So your third choice was the winner, the array has dynamic type and includes a computed data location. > > > > So, we end up with a lazy lval_internalvar_component representing a > > > field within an lval_internalvar. This eventually ends up in > > > value_fetch_lazy, which currently does not support > > > lval_internalvar_component, and we see the error above. > > > > > > My original plan for how to handle this involved extending > > > value_fetch_lazy to handle lval_internalvar_component. However, when > > > I did this I ran into another error: > > > > > > (gdb) set $a = some_var > > > (gdb) p $a > > > $1 = ( array_one = ((1, 1) (1, 1) (1, 1)), a_field = 5, array_two = ((0, 0, 0) (0, 0, 0)) ) > > > (gdb) p $a%array_one > > > $2 = ((1, 1) (1, 1) (1, 1)) > > > (gdb) p $a%array_one(1,1) > > > ../../src/gdb/value.c:1547: internal-error: void set_value_address(value*, CORE_ADDR): Assertion `value->lval == lval_memory' failed. > > I am not surprised. Intuitively, like you said, we expect GDB > to "capture" the value of our variable, so we should have anything > lazy about it, or else this would indicate an incomplete capture. Agreed. > > > In an ideal world (I think) GDB would be > > > able to do this even for values with dynamic type. So in our above > > > example doing `set $a = some_var` would capture the content of > > > 'some_var', but also the content of 'array_one', and also > > > 'array_two', even these content regions are not contained within the > > > region of 'some_var'. > > This would be my understanding as well, provided the arrays are *> references*. For pointers, I think it's fine to continue with > the idea that we capture the target address, but not the target > memory region it points to. Again, I think we agree. The problem in terms of implementation is that really everything is either a real inline value, or a pointer. All the other words are just language sugar on top of these two choices. In C then things are dead simple, something is either a pointer or is the actual contents of the value, but the language exposes all this to the programmer, so there's little room for surprise. When we look at C++ references (basically pointers + automatic dereferencing), or Fortran allocatable variables (same again) things are less clear, we capture the underlying pointer, but can (especially for Fortran) display the value with automatic dereferencing. You specifically asked about references, I'm taking this to mean C++ references. Consider this test program: #include struct xxx { int &val; }; void func (xxx x) { printf ("Got: %d\n", x.val); x.val = 0; } int main () { int i = 3; xxx x = { i }; func (x); /* Break 1. */ printf ("Returning: %d\n", i); /* Break 2. */ return i; } Now our GDB session: Breakpoint 1, main () at ref.cc:20 20 func (x); (gdb) p x $1 = { val = @0x7fffffffb55c } (gdb) p x.val $2 = (int &) @0x7fffffffb55c: 3 (gdb) set $foo = x (gdb) p $foo $3 = { val = @0x7fffffffb55c } (gdb) p $foo.val $4 = (int &) @0x7fffffffb55c: 3 (gdb) next Got: 3 21 printf ("Returning: %d\n", i); (gdb) p $foo.val $5 = (int &) @0x7fffffffb55c: 0 (gdb) p x.val $6 = (int &) @0x7fffffffb55c: 0 (gdb) So we get the behaviour we might expect, the pointer value underlying the reference is preserved, but the value pointed too is not. Interestingly the choice was made to not automatically dereference the C++ references, so they are displayed in a semi-pointer fashion, the type prefix and the '@' symbol being what tells them apart from regular pointers. > > > > Supporting this would require GDB values to be able to carry around > > > multiple non-contiguous regions of memory at content in some way, > > > which sounds like a pretty huge change to a core part of GDB. > > > > > > So, I wondered if there was some other solution that wouldn't require > > > such a huge change. > > > > > > What if values with a dynamic location were though of like points with > > > automatic dereferencing? Given this C structure: > > > > > > struct foo_t { > > > int *val; > > > } > > > > > > struct foo_t my_foo; > > > > > > Then in GDB: > > > > > > (gdb) $a = my_foo > > > > > > We would expect GDB to capture the pointer value in '$a', but not the > > > value pointed at by the pointer. So maybe it's not that unreasonable > > > to think that given a dynamically typed field GDB will capture the > > > address of the content, but not the actual content itself. > > > > > > That's what this patch does. > > I admit I don't really understand quite how this is all happening, > and how you're trying to deal with the issue. I'm not sure which bit you don't understand, as in the next paragraph you give an accurate description of what I'm proposing... > It's possible that the compromise you suggest (treat dynamic components > the same as pointers) might be the most reasonable way out, but I think > it'll invite confusion on the users' side, and probably bug reports. > At the very least, I think we should warn users when we do this, so > as to be sure to set expectations right, on the spot. Adding a warning would be reasonably simple, we can start with (in value.c:set_internalvar): if (is_dynamic_type (value_type (new_data.value))) warning ("some warning text here..."); There's two problems, the first is easy enough to solve: if the top level value being captured is dynamic, then we do capture the _actual_ value, it's only when a sub-component is dynamic that we have problems. The above check will trigger if only the top-level value is dynamic, so it warns in too many places. As a concrete example, given this Fortran type: type :: some_type integer, allocatable :: array_one (:,:) integer :: a_field integer, allocatable :: array_two (:,:) end type some_type type(some_type) :: some_var Then in GDB: (gdb) set $foo = some_var We capture the contents of the some_type struct, including the pointers to the dynamic objects array_one and array_two. But if instead we do: (gdb) set $bar = some_var%array_one Now we capture the full contents of array_one, there's no further dynamic type resolution required. Changing 'some_var%array_one' will not change the value of $bar, but the change would be see in $foo. The harder problem is, what warning do we print?? I initially went with: components of dynamically typed values are not currently captured within internal variables despite being a bit long, it's not immediately clear if a user will know what 'dynamically typed values' means? Maybe we end up needing a language specific warning, so for Fortran: the values of allocatable fields are not currently captured within internal variables thoughts or suggestions are welcome... > > Have you looked at how we handle components which are references? > I wonder how well we handle those... As above we treat them as pointers, but guard against possible confusion by displaying them as pointers. I would not like to change Fortran from displaying dynamnic types as their actual value (and instead just display a pointer) as that seems like a really bad change just to work around a limitation with internal variables. What I think is super interesting is how this all interacts with pretty-printers. So, if I start with this test program: #include struct xxx { std::vector lst; }; static void update (xxx &x) { x.lst.clear (); x.lst.push_back (4); x.lst.push_back (5); x.lst.push_back (6); } int main () { xxx x = { { 1, 2, 3 } }; update (x); return 0; } Then this is my GDB session (making use of C++ pretty-printers): Temporary breakpoint 1, main () at lst.cc:20 20 xxx x = { { 1, 2, 3 } }; (gdb) n 21 update (x); (gdb) set $foo=x (gdb) p $foo $1 = { lst = std::vector of length 3, capacity 3 = {1, 2, 3} } (gdb) n 22 return 0; (gdb) p $foo $2 = { lst = std::vector of length 3, capacity 3 = {4, 5, 6} } Notice that the contents of the std::vector changed in the variable $foo. This I think is the closest to the Fortran case. For Fortran GDB itself is providing the pretty-printing (it prints the dynamic value rather than just displaying a pointer), and like with the std::vector case above, the actual value is not captured, but just printed. I wonder if this problem should just be solved (at least in the short/medium term) by improving the documentation for internal variables? Thanks, Andrew