On Wednesday 16 January 2008 04:28:32 Jim Blandy wrote: > > (Since it took me a bit to re-discover what this patch was about, I > figure I should recap...) > > breakpoint_re_set_one gets called when the symbolic information > available to GDB has changed: we've re-run or attached and the > executable file has changed since we last read it, or the program has > loaded or unloaded a shared library. I think we've established that right now, breakpoint_re_set_one is not called when a library is unloaded. > The breakpoint_re_set_one > function is responsible for updating all references to symbol table > contents in breakpoints, watchpoints, etc. > > A watchpoint can refer to local variables that were in scope when it > was set, so it may be associated with a particular frame. However, > breakpoint_re_set_one re-parses watchpoint expressions using > parse_expression, which parses in the scope of whatever frame happens > to selected when it is called. This is very wrong: GDB can call > breakpoint_re_set_one in many different circumstances, and the frame > selected at that time has no necessary relationship to the frame in > which the user set the watchpoint. > > In particular, GDB calls breakpoint_re_set_one when it stops for a > shared library load or unload. At this point, the selected frame is > the youngest frame, in the dynamic linker's debug breakpoint function. > Since watchpoints' local variables are almost certainly not in scope > in that function, GDB fails to re-parse watchpoint expressions that > refer to local variables whenever we do a dlopen or dlclose. (And > then fails to handle the error and dies.) For example: > > $ cat w1.c > #include > > int > main (int argc, char **argv) > { > volatile int g = 1; > > g = 2; > > void *lib = dlopen ("libm.so", RTLD_NOW); > > g = 3; > > return g; > } > $ gcc -g w1.c -ldl -o w1 > $ ~/gdb/pub/nat/gdb/gdb w1 > GNU gdb 6.7.50.20080111-cvs > Copyright (C) 2008 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "i686-pc-linux-gnu"... > (gdb) b 10 > Breakpoint 1 at 0x8048403: file w1.c, line 10. > (gdb) run > Starting program: /home/jimb/play/w1 > > Breakpoint 1, main () at w1.c:10 > 10 void *lib = dlopen ("libm.so", RTLD_NOW); > (gdb) watch g > Hardware watchpoint 2: g > (gdb) next > Error in re-setting breakpoint 2: No symbol "g" in current context. > Segmentation fault (core dumped) > $ > > I believe Vlad's most recent patch is this one: > > http://sourceware.org/ml/gdb-patches/2007-11/msg00522.html > > The patch has two effects: > > First, in breakpoint_re_set_one (w), if W->exp_valid_block is NULL > (meaning that the watchpoint refers to no local variables), or if it > is set and GDB can find W->watchpoint_frame on the stack, then it > tries to re-parse the expression in exp_valid_block. > > Second, it re-parses watchpoint condition expressions (the Y in 'watch > X if Y') when it inserts breakpoints. > > I have two questions about this patch. > > - If we dlclose a shared library while there are still frames in that > library's code on the stack, couldn't exp_valid_block be pointing to > garbage (i.e. a 'struct block' in the shared library whose debug > info has now been freed), even though frame_find_by_id can find the > frame? It's possible. And it's pre-existing problem ;-) Basically, now both dlopen and dlclose mess up watchpoints, and the patch is a step that brings us closer to dlopen working. > Granted, the user's program is unhappy, because it's going to return > to code that isn't there any more, but GDB shouldn't crash --- after > all, debuggers are meant for use on unhappy programs. > > If we can find the frame, and we can find the frame's block, could > we use that instead? Did you mean "if we can find the same, and *can't* find the block"? Basically, if we have a watchpoint associated with frame id, and we can't find that frame id, there's nothing we can do -- because we can't access local variables relatively to frame base we don't know. However, it's possible that frame is found, but in fact, the block is gone -- in which case we should not try to use the gone block. As I mention, this is pre-existing problem. We need a mechanism to get notified when a block is gone. Note that it's strictly speaking different from shared library unloaded, since syntab is not necessary removed at this point. So, ideally, we'd need a new observer that's called when symtab is removed. We'll plug into that observer and disable all watchpoints which conditions use the gone blocks. In fact, I'm not sure that this problem is only relevant to watchpoints. Ordinary breakpoint might well have a condition that uses variables from shared libraries, and such condition will become invalid when shared library is removed. So, I think it's a global existing problem, and we should postpone it. > In the perverse case I outlined, I assume we'd > be able to find the frame, but not the block. > > Certainly, for watchpoints that cite no local variables, we should > just always try to re-parse. The prior conversation identified some > ugly issues there, too, but that's an existing bug; let's take it > one step at a time. > > - Why shouldn't we re-parse the condition at the same time we re-parse > the expression, in the same block? Why should we re-parse the > condition when we insert breakpoints? > > Although I agreed to reparsing the condition when we insert > breakpoints earlier in the thread, I can't see now why it's a good > idea. It seems wasteful, since we remove and insert breakpoints > frequently, and since we do remove and re-insert breakpoints on > dlopen/dlclose, it doesn't seem to be a significantly more > auspicious time to do the parse. Well, we reparse the entire watchpoint expression on each insert already, so reparsing a condition is not a big deal. However, you're right it's not needed, strictly speaking. Honestly, the point of moving it for this patch was to place all code for updating watchpoint in one place. The next patch, that switches watchpoint to using multiple locations, introduced update_watchpoint function which is a single place for all watchpoint updates. Looking at the patch, however, I see that I always reparse condition anyway. I've changed that patch to reparse condition only when update_watchpoint is called with reparse=1, which happens only from breakpoint_re_set_one, and I don't get any regression as result. So, I guess the solutions are either: 1. Omit the move of condition parsing from this patch, or 2. Keep condition parsing in this patch, and then commit the use-multiple-locations-for-watchpoint patch which will immediately make condition only reparsed when breakpoint_re_set_one is called. Given that use-multiple-locations-for-watchpoint patch is approved except for the bit that makes reparsing of condition optional, it seems that (2) is less work. Jim, does this look good for you? For convenience, I attach the current version of use-multiple-locations-for-watchpoint patch. The difference to the previous revision is as follows: @@ -966,7 +966,7 @@ update_watchpoint (struct breakpoint *b, int reparse) value_free (v); } - if (b->cond_string != NULL) + if (reparse && b->cond_string != NULL) { char *s = b->cond_string; if (b->loc->cond) Dan, is the patch still OK to commit? - Volodya