From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 115229 invoked by alias); 20 Dec 2018 23:03:47 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 115218 invoked by uid 89); 20 Dec 2018 23:03:47 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-7.7 required=5.0 tests=BAYES_50,GIT_PATCH_2,GIT_PATCH_3,KAM_STOCKGEN,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=sk:gdb-pat, sk:gdbpat, snippet, hitting X-HELO: smtp.polymtl.ca Received: from smtp.polymtl.ca (HELO smtp.polymtl.ca) (132.207.4.11) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 20 Dec 2018 23:03:43 +0000 Received: from simark.ca (simark.ca [158.69.221.121]) (authenticated bits=0) by smtp.polymtl.ca (8.14.7/8.14.7) with ESMTP id wBKN3a95020835 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 20 Dec 2018 18:03:41 -0500 Received: by simark.ca (Postfix, from userid 112) id B92121E7B1; Thu, 20 Dec 2018 18:03:36 -0500 (EST) Received: from simark.ca (localhost [127.0.0.1]) by simark.ca (Postfix) with ESMTP id C49BE1E093; Thu, 20 Dec 2018 18:03:33 -0500 (EST) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Thu, 20 Dec 2018 23:03:00 -0000 From: Simon Marchi To: Eli Zaretskii Cc: gdb-patches@sourceware.org Subject: Re: GDB internal error in pc_in_thread_step_range In-Reply-To: <837eg4cick.fsf@gnu.org> References: <83h8kjr8r6.fsf@gnu.org> <100001f1b27aa7d90902a75d5db37710@polymtl.ca> <83a7m6tk92.fsf@gnu.org> <8336qxfpjo.fsf@gnu.org> <83tvjde68l.fsf@gnu.org> <83ftutcy7p.fsf@gnu.org> <659d33b5e4af35aea6c3aaef08559f31@polymtl.ca> <837eg4cick.fsf@gnu.org> Message-ID: <988ca92d2c5c976fbea57c2381eb6279@polymtl.ca> X-Sender: simon.marchi@polymtl.ca User-Agent: Roundcube Webmail/1.3.6 X-IsSubscribed: yes X-SW-Source: 2018-12/txt/msg00242.txt.bz2 On 2018-12-20 10:45, Eli Zaretskii wrote: >> Date: Wed, 19 Dec 2018 19:16:15 -0500 >> From: Simon Marchi >> Cc: gdb-patches@sourceware.org >> >> > (top-gdb) p msymbol >> > $3 = {minsym = 0x10450d38, objfile = 0x10443b48} >> > (top-gdb) p msymbol.minsym.mginfo.name >> > $4 = 0x104485cd "__register_frame_info" >> > (top-gdb) p msymbol.minsym.mginfo >> > $5 = {name = 0x104485cd "__register_frame_info", value = {ivalue = 0, >> > block = 0x0, bytes = 0x0, address = 0x0, common_block = 0x0, >> > chain = 0x0}, language_specific = {obstack = 0x0, demangled_name >> > = 0x0}, >> > language = language_auto, ada_mangled = 0, section = 0} >> >> Ok. Well this is already strange. Why is there an mst_text (code) >> symbol with a value of 0? > > Its address is zero because it's an unresolved symbol: > > d:\usr\eli>nm -A hello0.exe | fgrep " U " > hello0.exe: U ___deregister_frame_info > hello0.exe: U ___register_frame_info > hello0.exe: U __Jv_RegisterClasses Huh, interesting. I looked at elfread, and similar undefined symbols are skipped: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/elfread.c;h=71e6fcca6ec62ec57f93f06d8a9913612be6f9e2;hb=HEAD#l270 > This symbol comes from a weak symbol defined in MinGW crtbegin.o: > > d:\usr\eli>nm -A lib/gcc/mingw32/6.3.0/crtbegin.o | fgrep _frame_info > lib/gcc/mingw32/6.3.0/crtbegin.o:00000000 A > .weak.___deregister_frame_info.___EH_FRAME_BEGIN__ > lib/gcc/mingw32/6.3.0/crtbegin.o:00000000 A > .weak.___register_frame_info.___EH_FRAME_BEGIN__ > lib/gcc/mingw32/6.3.0/crtbegin.o: w ___deregister_frame_info > lib/gcc/mingw32/6.3.0/crtbegin.o: w ___register_frame_info Is crtbegin.o an object file you link with statically when compiling with mingw? If so, why would you end up with an undefined reference in the final executable? Or is it linked dynamically at runtime? >> If your binary is anything like those I can >> produce with x86_64-w64-mingw32-gcc (and it looks similar, given the >> addresses you show), your "image base" is likely 0x400000, and "base >> of >> code" 0x1000 (0x401000 in absolute). I found this information using >> "objdump -x", in the header somewhere. I therefore expect all text >> symbols to be >= 0x401000. I would start digging why this text symbol >> with a value of 0 exists. > > See above. But please note that I use mingw.org's MinGW, and my > executables are 32-bit, whereas you use MinGW64 and 64-bit > executables. So some details might be different; in particular, I > don't think MinGW64 has this problematic symbol, because it's specific > to the DWARF2 exception unwinding implemented in libgcc, which 64-bit > Windows executables don't use. Indeed, they are similar but not identical. The file you provided as attachment is very useful to see what's in your binary. >> It would be interesting to look at some other symbols in the msymbols >> vector. Are the other mst_text symbols >= 0x401000? > > There are 2 more unresolved mst_text symbols, see above; they all have > a zero address. All the others are above 0x401000, indeed. > > The lowest-address resolved minimal symbol whose type is mst_text is > this: > > (top-gdb) p msymbol[22] > $112 = {mginfo = {name = 0x10447d95 "_mingw32_init_mainargs", value = > { > ivalue = 4199072, block = 0x4012a0 <_mingw32_init_mainargs>, > bytes = 0x4012a0 <_mingw32_init_mainargs> "Æ\222?<\215D$,\307D$\004", > address = 0x4012a0, common_block = 0x4012a0 <_mingw32_init_mainargs>, > chain = 0x4012a0 <_mingw32_init_mainargs>}, language_specific = { > obstack = 0x0, demangled_name = 0x0}, language = language_auto, > ada_mangled = 0, section = 0}, size = 0, filename = 0x0, type = > mst_text, > created_by_gdb = 0, target_flag_1 = 0, target_flag_2 = 0, has_size > = 0, > hash_next = 0x0, demangled_hash_next = 0x0} > > Interestingly, objdump shows this symbol in section 1: > > [ 0](sec 1)(fl 0x00)(ty 20)(scl 2) (nx 0) 0x000002a0 > __mingw32_init_mainargs > > whereas the above minsym information shows section = 0. Is this > expected? If "real" symbols were to have section > 0, we could > perhaps reject the unresolved ones. Indeed. I think the objdump output is misleading. The indices in the "Sections:" section are 0-based. But the indices in the "SYMBOL TABLE:" section look 1-based, as described here: https://docs.microsoft.com/en-us/windows/desktop/debug/pe-format#coff-symbol-table https://docs.microsoft.com/en-us/windows/desktop/debug/pe-format#section-number-values So it looks like we should indeed skip symbols with section == 0. We may also want to skip symbols with section == -1 (IMAGE_SYM_ABSOLUTE), such as __major_subsystem_version__. I don't if anything relies on some of those symbols though. >> Assuming this minimal symbol is wrong and assuming it wasn't there, >> then >> I guess the search would fail and we would fall in the "Cannot find >> bounds of current function" case of prepare_one_step? That would be >> appropriate in this case. > > It's not wrong, but perhaps lookup_minimal_symbol_by_pc_section should > reject unresolved symbols for this purpose. However, the question is > how? One possibility is by their zero address. (I don't see the weak > attribute, or any other indication of its being unresolved, in the > minimal symbol attributes.) > > In any case, if we do call the "Cannot find bounds of current > function" error, that will throw to the command loop, which I think is > undesirable in this case. We want GDB to step out of this code, not > to error out. When we have no line information for the current PC and the user asks us to step, we fall back to "single step until out of the current function". But if the minimal symbol information doesn't let us know the bounds of the current function, then we can't "single step until out of the current function", because we don't know where it starts/end. In your binary, the lowest .text function symbol is __mingw32_init_mainargs at 0x000002a0 (0x4012a0 once relocated). Your pc is 0x40126d (according to an earlier message, but reading lower I realize this may not be valid anymore), which is lower. So there's just no minimal symbol for GDB to find. In that case, it sounds right to error our and say "I can't step, I don't have enough information". The user can still use stepi. Side-question, are there some debug symbols in the binary that could describe this location? Which debug format (DWARF or equivalent) is generated when you use -g with mingw? >> Ok, from what I understand, all these "mst_abs" symbols do not >> represent >> addresses. They just represent numerical "values", like version >> numbers, alignment sizes, etc. So it seems right to skip them when >> looking for the minimal symbol preceding pc. >> >> It looks like minimal_symbol_upper_bound is buggy, in that it should >> not >> consider these mst_abs. If we are looking for the end of a memory >> range, we should not consider those symbols that do not even represent >> memory addresses... > > Indeed, the following change is enough to avoid the internal error: > > --- gdb/minsyms.c~0 2018-07-04 18:41:59.000000000 +0300 > +++ gdb/minsyms.c 2018-12-20 08:06:11.516834500 +0200 > @@ -1514,7 +1514,8 @@ minimal_symbol_upper_bound (struct bound > { > if ((MSYMBOL_VALUE_RAW_ADDRESS (msymbol + i) > != MSYMBOL_VALUE_RAW_ADDRESS (msymbol)) > - && MSYMBOL_SECTION (msymbol + i) == section) > + && MSYMBOL_SECTION (msymbol + i) == section > + && MSYMBOL_TYPE (msymbol + i) != mst_abs) > break; > } Note that if we implement the solution of rejecting the symbols with section == -1, those mst_abs symbols won't be there anymore. > However, it still shows the incorrect function name from the > zero-address symbol: > > 7 } > (gdb) n > 0x00401288 in __register_frame_info () > (gdb) n > Single stepping until exit from function __register_frame_info, > which has no line number information. > [Inferior 1 (process 10424) exited normally] > > I think if we want to avoid showing __register_frame_info, we need > further changes in lookup_minimal_symbol_by_pc_section. But I don't > see how this will help us, unless we also allow displaying the above > message for functions whose names we don't know, perhaps saying > something like > > Single stepping until exit from function The problem is not only that we are missing the name, but most importantly that we are missing the bounds of the current function. With what you've implemented here, GDB thinks there is a function that occupies the range [0,401000[ (something like that), so it single steps until it gets out of that range, but the process exits before. So it kind of works for your use case, but it's not right, IMO. If the process did not exit as it does here, the behavior would be erratic. If GDB doesn't have the data to do something right, it should bail instead of doing something random. >> 2. investigate if there should be some text symbol that should really >> contain 0x0040126d, that for some reason does not end up in GDB's >> minimal symbol table. > > The function in which the PC value of 0x401288 lives is > __mingw_CRTStartup, which ends like this: > > /* Call the main() function. If the user does not supply one > * the one in the 'libmingw32.a' library will be linked in, and > * that one calls WinMain(). See main.c in the 'lib' directory > * for more details. > */ > nRet = main (_argc, _argv, environ); > > /* Perform exit processing for the C library. This means flushing > * output and calling atexit() registered functions. > */ > _cexit (); > > ExitProcess (nRet); > } > > This function is declared in the MinGW runtime sources as follows: > > static __MINGW_ATTRIB_NORETURN void __mingw_CRTStartup (void); > > But its symbol is not in the symbol table. Not sure why, perhaps > because it's a static function? But the code is there, starting at > the address 0x4011b0. The last part, after exiting 'main', which > corresponds to the above source snippet is this: > > (gdb) disassemble 0x401283,0x401294 > Dump of assembler code from 0x401283 to 0x401294: > 0x00401283 <__register_frame_info+4199043>: call 0x401460 >
> 0x00401288 <__register_frame_info+4199048>: mov %eax,%ebx > 0x0040128a <__register_frame_info+4199050>: call 0x403a90 > <_cexit> > 0x0040128f <__register_frame_info+4199055>: mov %ebx,(%esp) > 0x00401292 <__register_frame_info+4199058>: call 0x403b28 > > > So when this problem happens, we are at the "mov %eax,%ebx" > instruction after exiting 'main', as I'd expect. Ok, well I think it shows the problem quite clearly, some symbol is missing for GDB to work properly in that context. I think that we should improve GDB to handle it better error out clearly (instead of hitting a failed assert), but I don't think it can do much more. I guess that having debug info for the file containing __mingw_CRTStartup would help, if you really needed to step past main? Simon