From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nick Duffek To: fnasser@redhat.com Cc: gdb@sources.redhat.com, cagney@redhat.com Subject: Re: alloca is bad? Date: Sat, 11 Nov 2000 13:16:00 -0000 Message-id: <200011112122.eABLMmb15214@rtl.cygnus.com> References: <3A0D9FC7.48720E54@cygnus.com> X-SW-Source: 2000-11/msg00099.html Do we have a standard way to decide issues like this? Perhaps there should be a group of GDB developers who vote on coding standards and other contentious technical issues. On 11-Nov-2000, Fernando Nasser wrote: >Meomory leak -> small problem, easy to identify the culprit. >Stack corruption -> big problem, difficult to know even where to start. It's not memory leaks vs. stack corruption, it's memory leaks + heap corruption + coding errors vs. stack corruption. Pseudo-mathematically, the pro-alloca claim is: memory leaks + heap corruption + coding errors > stack corruption where ">" means "is worse than". If heap corruption == stack corruption, then they cancel out and the above becomes: memory leaks + coding errors > 0 which is obviously true. So, is it true that heap corruption == stack corruption? Memory corruption badness can be quantified as: (likelihood of occurrence) * (difficulty in tracking down) If we establish appropriate guidelines on maximum alloca sizes, then stack corruption is no more likely than heap corruption, so roughly: heap corruption likelihood == stack corruption likelihood You claim that stack corruption is harder to track down than heap corruption. I claim the opposite, based on personal experience and arguments presented elsewhere in this thread. But until someone posts hard numbers quantifying this question, I think such claims aren't meaningful, so for now: heap corruption difficulty == stack corruption difficulty which completes the "proof" of the original claim. :-) Incidentally, I disagree with: >Meomory leak -> small problem, easy to identify the culprit. When a leak is small, it's insidious. GDB gets a bit more sluggish, but not enough that anyone thinks to check for a memory leak. Therefore, leaks can go undetected for years. Total GDB user time wasted: (tiny slowdown) * (entire GDB user population) * (years) By contrast, memory corruption usually gets detected and fixed pretty quickly. Total GDB user time wasted: (time to revert to previous version) * (the few GDB users who found the corruption before it was fixed) It wouldn't surprise me if the former were bigger than the latter. Nick >From cgf@redhat.com Sat Nov 11 16:13:00 2000 From: Christopher Faylor To: Fernando Nasser Cc: Michael Meissner , gdb@sources.redhat.com, Andrew Cagney Subject: Re: alloca is bad? Date: Sat, 11 Nov 2000 16:13:00 -0000 Message-id: <20001111191249.A24251@redhat.com> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> X-SW-Source: 2000-11/msg00100.html Content-length: 580 On Sat, Nov 11, 2000 at 07:51:36PM +0000, Fernando Nasser wrote: >Someone said that heap corruption was harder to track than stack >corruption. > >I couldn't disagree more. Many (most?) of the times the function tries >to return and gets a buggy return address and frame pointer. >It then crashes and you have no idea where it happened. Someone, i.e., me, has pointed out that stack overruns happen when you use auto arrays and pointers to auto variables. I don't see how you can use this as an argument unless you are advocating that we should only use static variables. cgf >From eliz@delorie.com Sat Nov 11 21:39:00 2000 From: Eli Zaretskii To: fnasser@cygnus.com Cc: cgf@redhat.com, meissner@cygnus.com, gdb@sources.redhat.com, cagney@cygnus.com Subject: Re: alloca is bad? Date: Sat, 11 Nov 2000 21:39:00 -0000 Message-id: <200011120538.AAA01237@indy.delorie.com> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> X-SW-Source: 2000-11/msg00101.html Content-length: 1693 > Date: Sat, 11 Nov 2000 19:51:36 +0000 > From: Fernando Nasser > > Someone said that heap corruption was harder to track than stack > corruption. > > I couldn't disagree more. Many (most?) of the times the function tries > to return and gets a buggy return address and frame pointer. > It then crashes and you have no idea where it happened. The same happens with overrunning malloc'ed buffers or free'ing the same buffer twice: you get a corrupted chain of memory buffers in the malloc-maintained pool of memory, and the program crashes at some random point down the road, inside malloc or inside free. The core file contains no evidence about who corrupted the heap. If you don't have sources to malloc, you usually ned an malloc debugger to find the villain. However, many malloc debuggers increase the memory footprint to such an extent that it becomes impractical to use them in large programs which allocate and free memory all the time. In contrast, with stack corruption, the crash is much more close to the corruption point, usually a function call or two away, because the stack is a single entity that gets exercised all the time, not subdivided into buckets like in a typical malloc implementation. It is relatively easy to find the function which corrupted the stack, e.g. by putting a watchpoint on the stack pointer register that catches the moment when it is below the stack limit (assuming expand-down stack). Since registers can usually only be watched by software watchpoints, knowing the approximate area where the offending code should be is very important, otherwise running a program in single-step can render this technique impractical. >From fnasser@cygnus.com Sun Nov 12 00:07:00 2000 From: Fernando Nasser To: Eli Zaretskii Cc: cgf@redhat.com, meissner@cygnus.com, gdb@sources.redhat.com, cagney@cygnus.com Subject: Re: alloca is bad? Date: Sun, 12 Nov 2000 00:07:00 -0000 Message-id: <3A0E4F83.5F9D97FF@cygnus.com> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> <200011120538.AAA01237@indy.delorie.com> X-SW-Source: 2000-11/msg00102.html Content-length: 1325 Eli Zaretskii wrote: > > (...) > In contrast, with stack corruption, the crash is much more close to > the corruption point, usually a function call or two away, because the > stack is a single entity that gets exercised all the time, not > subdivided into buckets like in a typical malloc implementation. It > is relatively easy to find the function which corrupted the stack, The problem is that with a corrupted SP and FP you have no idea of where it happened. Doesn't matter if the crash was immediately after the fact, all evidence of when it happened is wiped away. > e.g. by putting a watchpoint on the stack pointer register that > catches the moment when it is below the stack limit (assuming> expand-down stack). Since registers can usually only be watched by > software watchpoints, knowing the approximate area where the offending > code should be is very important, otherwise running a program in > single-step can render this technique impractical. On the other hand, if you can get your execution path to repeat in a deterministic way so heap pieces are allocated in the same locations, you can use hardware watchpoints (as it is memory, not registers). -- Fernando Nasser Red Hat Canada Ltd. E-Mail: fnasser@redhat.com 2323 Yonge Street, Suite #300 Toronto, Ontario M4P 2C9 >From fnasser@cygnus.com Sun Nov 12 00:20:00 2000 From: Fernando Nasser To: Christopher Faylor Cc: Michael Meissner , gdb@sources.redhat.com, Andrew Cagney Subject: Re: alloca is bad? Date: Sun, 12 Nov 2000 00:20:00 -0000 Message-id: <3A0E52BB.750E103A@cygnus.com> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> <20001111191249.A24251@redhat.com> X-SW-Source: 2000-11/msg00103.html Content-length: 1723 Christopher Faylor wrote: > > On Sat, Nov 11, 2000 at 07:51:36PM +0000, Fernando Nasser wrote: > >Someone said that heap corruption was harder to track than stack > >corruption. > > > >I couldn't disagree more. Many (most?) of the times the function tries > >to return and gets a buggy return address and frame pointer. > >It then crashes and you have no idea where it happened. > > Someone, i.e., me, has pointed out that stack overruns happen when you > use auto arrays That is true. Auto arrays are also cause of many stack corruption problems as well. But why add more sources? Because there is evil doesn't justify to add MORE evil. > and pointers to auto variables. That is bad programming. > I don't see how you can > use this as an argument unless you are advocating that we should only > use static variables. > I am saying nothing. The gdb maintainers established this rule quite some time ago. I was one that had to back up some code (on one of my early submissions) to remove calls to alloca (which I use in small programs that I am the only one to write code to). But I understood that in a program this size, with about 100+ people adding code to it you must be EXTRA careful. Thanks to this rules GDB, despite being now a much more sophisticated program, dumps core much less than it used to do in the past, when the rules were less strict. I am just defending the established rules here. They are working and IMO they should not be changed unless there are very good reasons and something close to consensus that they should be changed. -- Fernando Nasser Red Hat Canada Ltd. E-Mail: fnasser@redhat.com 2323 Yonge Street, Suite #300 Toronto, Ontario M4P 2C9 >From ac131313@cygnus.com Sun Nov 12 01:55:00 2000 From: Andrew Cagney To: Eli Zaretskii Cc: fnasser@cygnus.com, cgf@redhat.com, meissner@cygnus.com, gdb@sources.redhat.com, cagney@cygnus.com Subject: Re: alloca is bad? Date: Sun, 12 Nov 2000 01:55:00 -0000 Message-id: <3A0E677B.EBF4AFCA@cygnus.com> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> <200011120538.AAA01237@indy.delorie.com> X-SW-Source: 2000-11/msg00104.html Content-length: 1221 Eli Zaretskii wrote: > > > Date: Sat, 11 Nov 2000 19:51:36 +0000 > > From: Fernando Nasser > > > > Someone said that heap corruption was harder to track than stack > > corruption. > > > > I couldn't disagree more. Many (most?) of the times the function tries > > to return and gets a buggy return address and frame pointer. > > It then crashes and you have no idea where it happened. This is more in line with my personal experience. Something gets trashed in the heap and it is a fairly well defined mechanical process to first script a reproducable sequence that triggers the event and secondly script a debug sequence that watches for the corruption. Typically a few well placed watchpoints and a very long lunch (while it runs :-) does the trick. On the other hand applying such a technique to a stack corruption is far more complicated. The very nature of a stack is that it is constantly, and _annonymously_ being reused. Typically far more complex tracing sequences are needed to capture the corruption in progress. To look at it another way, all heap accesses are explict while the stack may either be accessed explictly or implicitly through function call/return. enjoy, Andrew >From eliz@delorie.com Sun Nov 12 04:17:00 2000 From: Eli Zaretskii To: fnasser@cygnus.com Cc: cgf@redhat.com, meissner@cygnus.com, gdb@sources.redhat.com, cagney@cygnus.com Subject: Re: alloca is bad? Date: Sun, 12 Nov 2000 04:17:00 -0000 Message-id: <200011121216.HAA01483@indy.delorie.com> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> <200011120538.AAA01237@indy.delorie.com> <3A0E4F83.5F9D97FF@cygnus.com> X-SW-Source: 2000-11/msg00105.html Content-length: 1126 > Date: Sun, 12 Nov 2000 08:06:27 +0000 > From: Fernando Nasser > > The problem is that with a corrupted SP and FP you have no idea of where > it happened. Doesn't matter if the crash was immediately after the fact, > all evidence of when it happened is wiped away. ??? The core file will usually tell you the function in which it crashed, and sometimes the one which called it (if you are lucky). GDB doesn't need the stack information to tell you what function crashed, the value of $pc should be enough. At least usually. Or am I missing something? > On the other hand, if you can get your execution path to repeat in a > deterministic way so heap pieces are allocated in the same locations, > you can use hardware watchpoints (as it is memory, not registers). This is usually almost impossible without having sources to malloc. Without the sources, you won't know where to put the watchpoints: all you have is the garbled pointer in the register which caused the crash, and the crash is usually not related to what your program did at the point of the crash, so your own variables don't help. >From cgf@redhat.com Sun Nov 12 10:39:00 2000 From: Chris Faylor To: gdb@sources.redhat.com Subject: Re: alloca is bad? Date: Sun, 12 Nov 2000 10:39:00 -0000 Message-id: <20001112133936.A29086@redhat.com> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> <200011120538.AAA01237@indy.delorie.com> <3A0E4F83.5F9D97FF@cygnus.com> <200011121216.HAA01483@indy.delorie.com> X-SW-Source: 2000-11/msg00106.html Content-length: 5147 On Sun, Nov 12, 2000 at 07:16:58AM -0500, Eli Zaretskii wrote: >> Date: Sun, 12 Nov 2000 08:06:27 +0000 >> From: Fernando Nasser >> >> The problem is that with a corrupted SP and FP you have no idea of where >> it happened. Doesn't matter if the crash was immediately after the fact, >> all evidence of when it happened is wiped away. > >??? The core file will usually tell you the function in which it >crashed, and sometimes the one which called it (if you are lucky). >GDB doesn't need the stack information to tell you what function >crashed, the value of $pc should be enough. At least usually. > >Or am I missing something? I've seen cases where the $pc was 0 and the frame pointer was screwed up, making a back trace impossible. That requires rerunning the program and judicious use of display and step. Whenever I have a stack problem, I "display" the memory location that is being corrupted, rerun the program, and "next" around until I see it fail. When I see the failure, I step into the offending function. I've debugged many cases of stack corruption over my disgustingly long career as a programmer. I can't remember a specific case that troubled me for very long. I can remember, at least two cases where I spent days trying to track down heap corruption problems, however. Wasn't it mentioned that heap corruption was "easier" to track down because there were tools for doing so on this thread? Doesn't the fact that there has to be special tools for debugging heap problems provide enough evidence that heap corruption problems are not trivially debugged? >> On the other hand, if you can get your execution path to repeat in a >> deterministic way so heap pieces are allocated in the same locations, >> you can use hardware watchpoints (as it is memory, not registers). > >This is usually almost impossible without having sources to malloc. >Without the sources, you won't know where to put the watchpoints: all >you have is the garbled pointer in the register which caused the >crash, and the crash is usually not related to what your program did >at the point of the crash, so your own variables don't help. Right. Also, the "if you can get your execution path to repeat in a determinstic way" argument applies equally well to the stack. If a variable is getting clobbered, you find out where it has been stored on the stack and monitor that location until you see it change. You may step by a function that causes the corruption but then you rerun the program and try again. In the case of stack problems you don't have to wonder if a function call made thousands of instructions earlier could have been the culprit that is screwing you up now. If you see stack corruption it will be in one of the functions on the stack. I know that it is likely that no one here has much experience with threaded programming, but I would much rather debug stack corruption in a threaded environment than I would heap corruption. If you have a situation where more than one thread can be doing a malloc, as is the case with cygwin, then you've thrown the whole "deterministic" thing out of the window as far as the heap is concerned. In threaded applications, you *try* to use the stack for local storage as much as possible so that each thread can have its own relatively isolated data storage. Again, I have to point out that this whole "stack problems are wicked hard to debug" argument is specious, anyway. All of the problems that you have with an alloca'ed array apply to local arrays and pointers to local variables. Fernando has implied that using addresses to auto variables is poor programming. IMO, that statement is provably incorrect since many UNIX system functions require pointers in their parameter lists. I've never heard anyone claim that the UNIX API was broken. Also, as far as debuggability and maintainability is concerned, I have a hard time believing that anyone could assert that this: struct cleanup *chain; char *foo; foo = malloc (size); chain = make_cleanup (free, (void *) foo); . . . many lines of code . . do_cleanups (chain) was more apt to be error-free and understandable than this: char *foo; foo = alloca (size); As an exercise, imagine what would happen if I had gone with my first, incorrect, inclination and done this: chain = make_cleanup (free, (void *) &foo); Nick Duffek has requested that the procedures for determining programming guidelines in GDB be clarified. The feeling that I get from Andrew and Fernando is that this alloca discussion has raged on before and that the issue had been decided time ago. I've checked the mailing list archives and I don't see any definitive statements about this subject, however. If there is a document that says "Don't Use alloca and Here's Why" no one has pointed it out yet. Since at least four GDB maintainers have expressed their desire to use alloca, is that an adequate quorom? What's the procedure here? Are we eventually going to come to a decision and document it? Or, are we going to keep wrangling until someone gets tired? cgf >From kevinb@cygnus.com Sun Nov 12 11:30:00 2000 From: Kevin Buettner To: gdb@sources.redhat.com Subject: Re: alloca is bad? Date: Sun, 12 Nov 2000 11:30:00 -0000 Message-id: <1001112193029.ZM7025@ocotillo.lan> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> <200011120538.AAA01237@indy.delorie.com> <3A0E4F83.5F9D97FF@cygnus.com> <200011121216.HAA01483@indy.delorie.com> <20001112133936.A29086@redhat.com> X-SW-Source: 2000-11/msg00107.html Content-length: 774 On Nov 12, 1:39pm, Chris Faylor wrote: > Since at least four GDB maintainers have expressed their desire to > use alloca, is that an adequate quorom? What's the procedure here? Are > we eventually going to come to a decision and document it? Or, are we > going to keep wrangling until someone gets tired? I'm tired and I haven't even been wrangling. :-) Others of you have argued long and eloquently on the matter, so I won't bother... I will "vote" however: 1) I agree with those who think that debugging heap related corruption is frequently more difficult that debugging stack related corruption. 2) I think the use of alloca() should be allowed so long as the space to be allocated does not exceed some relatively small, but reasonable bound. Kevin >From bstell@ix.netcom.com Sun Nov 12 11:54:00 2000 From: bstell@ix.netcom.com To: Eli Zaretskii Cc: bstell@netscape.com, gdb@sources.redhat.com Subject: Re: Displaying Unicode Date: Sun, 12 Nov 2000 11:54:00 -0000 Message-id: <3A0EF565.4E00AFFB@ix.netcom.com> References: <3A0B4885.B2384AE7@netscape.com> <200011101039.FAA29785@indy.delorie.com> <3A0C2791.A33BE9AC@ix.netcom.com> <200011102009.PAA00126@indy.delorie.com> <3A0C8E19.A8A6D355@netscape.com> <200011110732.CAA00523@indy.delorie.com> <3A0D94F6.9FC95EE8@ix.netcom.com> <200011112054.PAA00938@indy.delorie.com> X-SW-Source: 2000-11/msg00108.html Content-length: 893 Eli Zaretskii wrote: > If all you need is to display something like "U+1234" for a UTF-8 > encoding, then it should be easy to make this happen, either in GDB or > ... Great. What Gdb command do I use to display a UTF-8 or UTF-16 *string* with the ascii characters displayed as ascii (not "U+0041") and non ascii characters displayed as "U+1234"? I say *string* as I sure don't want to print each character out separately. > I thought you wanted something much more sophisticated (since text > which looks like "U+1234 U+4321 U+5789" etc. is all not easy to read). I did say: A more sophisticated display routine would check if the non ascii was displayable in the current locale and if so would convert it to the current locale encoding for display. This way developers that can read Japanese, etc. can see (hopefully something close to) the intended text. >From fnasser@cygnus.com Sun Nov 12 15:16:00 2000 From: Fernando Nasser To: Eli Zaretskii Cc: cgf@redhat.com, meissner@cygnus.com, gdb@sources.redhat.com, cagney@cygnus.com Subject: Re: alloca is bad? Date: Sun, 12 Nov 2000 15:16:00 -0000 Message-id: <3A0F24AA.50E22A01@cygnus.com> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> <200011120538.AAA01237@indy.delorie.com> <3A0E4F83.5F9D97FF@cygnus.com> <200011121216.HAA01483@indy.delorie.com> X-SW-Source: 2000-11/msg00110.html Content-length: 1411 Eli Zaretskii wrote: > > > Date: Sun, 12 Nov 2000 08:06:27 +0000 > > From: Fernando Nasser > > > > The problem is that with a corrupted SP and FP you have no idea of where > > it happened. Doesn't matter if the crash was immediately after the fact, > > all evidence of when it happened is wiped away. > > ??? The core file will usually tell you the function in which it > crashed, and sometimes the one which called it (if you are lucky). > GDB doesn't need the stack information to tell you what function > crashed, the value of $pc should be enough. At least usually. > > Or am I missing something? > Yes you are. As Andrew explained in his message, if the stack is corrupted the PC and FP can (and probably will) be clobbered with the garbage when the function returns. No backtrace, core dump or anything in this world will tell you where you were when this happens as all information has been obliterated. The GDB where command works by following the chain of stack frames by using the saved values of frame pointers (or equivalent mechanisms) to walk up the stack and give you that nice printout. Without a point to start, no chain can be recreated. Bottom line: for most stack corruption problems, no "where" ("backtrace") -- Fernando Nasser Red Hat Canada Ltd. E-Mail: fnasser@redhat.com 2323 Yonge Street, Suite #300 Toronto, Ontario M4P 2C9 >From ac131313@cygnus.com Sun Nov 12 15:16:00 2000 From: Andrew Cagney To: Chris Faylor Cc: gdb@sources.redhat.com Subject: Re: alloca is bad? Date: Sun, 12 Nov 2000 15:16:00 -0000 Message-id: <3A0F2306.474C70E1@cygnus.com> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> <200011120538.AAA01237@indy.delorie.com> <3A0E4F83.5F9D97FF@cygnus.com> <200011121216.HAA01483@indy.delorie.com> <20001112133936.A29086@redhat.com> X-SW-Source: 2000-11/msg00109.html Content-length: 2093 Chris Faylor wrote: > I've debugged many cases of stack corruption over my disgustingly long > career as a programmer. I can't remember a specific case that troubled > me for very long. I can remember, at least two cases where I spent days > trying to track down heap corruption problems, however. As they say, our energy efficency units have varied. > Nick Duffek has requested that the procedures for determining > programming guidelines in GDB be clarified. The feeling that I get from > Andrew and Fernando is that this alloca discussion has raged on before > and that the issue had been decided time ago. I've checked the mailing > list archives and I don't see any definitive statements about this > subject, however. If there is a document that says "Don't Use alloca > and Here's Why" no one has pointed it out yet. Largely maintainer wim and folk law. To re-iterate earlyer points. The official problem with alloca() is that it is very non-portable. You can be pretty sure problems with this date back to when GDB was first written and, consequently, the exact details are now well and truely lost in time. The BSD man page has: BUGS The alloca() function is machine dependent; its use is discouraged. the Solaris man page has: WARNINGS .... alloca() is machine-, compiler-, and most of all, system- dependent. Its use is strongly discouraged. With regard to multi-arch, the decision to allow alloca() was strictly pragmatic - using it was far less error proned then trying to convert all dynamic arrays to cleanups. Part of the pragmatism behind it was that it was thought that all the systems that had broken alloca() implementations had long-ago been switched off. > Since at least four GDB maintainers have expressed their desire to > use alloca, is that an adequate quorom? What's the procedure here? Are > we eventually going to come to a decision and document it? Or, are we > going to keep wrangling until someone gets tired? See the threads originating from: http://sources.redhat.com/ml/gdb/2000-11/msg00085.html Andrew >From ac131313@cygnus.com Sun Nov 12 15:32:00 2000 From: Andrew Cagney To: Kevin Buettner Cc: gdb@sources.redhat.com Subject: Re: alloca is bad? Date: Sun, 12 Nov 2000 15:32:00 -0000 Message-id: <3A0F26DF.B11A921A@cygnus.com> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> <200011120538.AAA01237@indy.delorie.com> <3A0E4F83.5F9D97FF@cygnus.com> <200011121216.HAA01483@indy.delorie.com> <20001112133936.A29086@redhat.com> <1001112193029.ZM7025@ocotillo.lan> X-SW-Source: 2000-11/msg00111.html Content-length: 773 Kevin Buettner wrote: > > On Nov 12, 1:39pm, Chris Faylor wrote: > > > Since at least four GDB maintainers have expressed their desire to > > use alloca, is that an adequate quorom? What's the procedure here? Are > > we eventually going to come to a decision and document it? Or, are we > > going to keep wrangling until someone gets tired? > > I'm tired and I haven't even been wrangling. :-) See: http://sources.redhat.com/ml/gdb/2000-11/msg00085.html as they say the process has already started. (The alloca() cat is already out of the bag so a yes/no vote is pretty useless :-). > 2) I think the use of alloca() should be allowed so long as the space > to be allocated does not exceed some relatively small, but reasonable > bound. Covered. Andrew >From meissner@cygnus.com Sun Nov 12 16:42:00 2000 From: Michael Meissner To: Andrew Cagney Cc: Jim Blandy , gdb@sources.redhat.com, Nick Duffek Subject: Re: alloca is bad? Date: Sun, 12 Nov 2000 16:42:00 -0000 Message-id: <20001112194252.13032@cse.cygnus.com> References: <20001109222231.A26675@redhat.com> <200011100503.eAA53pT24667@rtl.cygnus.com> <3A0B91AB.1815BC89@cygnus.com> <20001110012016.A18577@redhat.com> <3A0CE1DB.6F6B1E93@cygnus.com> X-SW-Source: 2000-11/msg00112.html Content-length: 1799 On Sat, Nov 11, 2000 at 05:06:19PM +1100, Andrew Cagney wrote: > o the suggestion that some platforms > don't support even libiberty's alloca. > > I'd be very interested in knowing the > details of this. Some (non-mainstream) systems allocate their stack in chunks from the heap. When the stack is exhausted for the current chunk, they allocate some more memory for the next chunk. This doesn't work because the libiberty version of alloca depends on the allocations being monotonically increasing or decreasing. When I was in the original X3J11 meetings, I remember this coming up, and the original Crays used this allocation technique and possibly some of the mainframe systems. I've also heard of systems which use languages like Simula 67 that have coroutines in them, each stack frame is allocated, and points back to the parent frame. Due to threading, there may be multiple pointers back to the parent frame, and stack frames are garbage collected like everything else. This is the so-called cactus stack that I mentioned. Also in the original meetings, when I proposed alloca (and got shot down basically 1 to everybody), Harris among others mentioned there was no way they could ever have a variable sized stack. Given C99 now has variable length arrays, I dunno whether they planned to allocate VLAs or what. Now, when RMS decided that alloca was a useful feature for the GNU system, he knew it might be restricting which systems it could be ported to. Most systems these days can support alloca in some form or another, but being pedantic, most != all. -- Michael Meissner, Red Hat, Inc. PMB 198, 174 Littleton Road #3, Westford, Massachusetts 01886, USA Work: meissner@redhat.com phone: +1 978-486-9304 Non-work: meissner@spectacle-pond.org fax: +1 978-692-4482 >From meissner@cygnus.com Sun Nov 12 16:49:00 2000 From: Michael Meissner To: Eli Zaretskii Cc: fnasser@cygnus.com, cgf@redhat.com, meissner@cygnus.com, gdb@sources.redhat.com, cagney@cygnus.com Subject: Re: alloca is bad? Date: Sun, 12 Nov 2000 16:49:00 -0000 Message-id: <20001112194857.22587@cse.cygnus.com> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> <200011120538.AAA01237@indy.delorie.com> <3A0E4F83.5F9D97FF@cygnus.com> <200011121216.HAA01483@indy.delorie.com> X-SW-Source: 2000-11/msg00113.html Content-length: 2211 On Sun, Nov 12, 2000 at 07:16:58AM -0500, Eli Zaretskii wrote: > > Date: Sun, 12 Nov 2000 08:06:27 +0000 > > From: Fernando Nasser > > > > The problem is that with a corrupted SP and FP you have no idea of where > > it happened. Doesn't matter if the crash was immediately after the fact, > > all evidence of when it happened is wiped away. > > ??? The core file will usually tell you the function in which it > crashed, and sometimes the one which called it (if you are lucky). > GDB doesn't need the stack information to tell you what function > crashed, the value of $pc should be enough. At least usually. > > Or am I missing something? Yes, what usually happens is the return address that is saved on the stack is corrupted (not necessarily the return address of the current function, but I've seen corruption once or twice a couple of levels back). When the function whose return address is corrupted returns it jumps to the address. If you are somewhat lucky, the address is not a legal address and you get an immediate core dump, though the pc is 0 or some such, which is totally useless, since on most hardware, there isn't a come from address that says what was the pc before the last jump. If you aren't as lucky, it will jump to random code, and it may be some time before you notice the corruption. > > On the other hand, if you can get your execution path to repeat in a > > deterministic way so heap pieces are allocated in the same locations, > > you can use hardware watchpoints (as it is memory, not registers). > > This is usually almost impossible without having sources to malloc. > Without the sources, you won't know where to put the watchpoints: all > you have is the garbled pointer in the register which caused the > crash, and the crash is usually not related to what your program did > at the point of the crash, so your own variables don't help. Or you can take one of the GPL mallocs and compile it on your system and link it in. -- Michael Meissner, Red Hat, Inc. PMB 198, 174 Littleton Road #3, Westford, Massachusetts 01886, USA Work: meissner@redhat.com phone: +1 978-486-9304 Non-work: meissner@spectacle-pond.org fax: +1 978-692-4482 >From meissner@cygnus.com Sun Nov 12 17:00:00 2000 From: Michael Meissner To: Andrew Cagney Cc: Eli Zaretskii , fnasser@cygnus.com, cgf@redhat.com, meissner@cygnus.com, gdb@sources.redhat.com, cagney@cygnus.com Subject: Re: alloca is bad? Date: Sun, 12 Nov 2000 17:00:00 -0000 Message-id: <20001112200034.53469@cse.cygnus.com> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> <200011120538.AAA01237@indy.delorie.com> <3A0E677B.EBF4AFCA@cygnus.com> X-SW-Source: 2000-11/msg00114.html Content-length: 1787 On Sun, Nov 12, 2000 at 08:48:43PM +1100, Andrew Cagney wrote: > Eli Zaretskii wrote: > > > > > Date: Sat, 11 Nov 2000 19:51:36 +0000 > > > From: Fernando Nasser > > > > > > Someone said that heap corruption was harder to track than stack > > > corruption. > > > > > > I couldn't disagree more. Many (most?) of the times the function tries > > > to return and gets a buggy return address and frame pointer. > > > It then crashes and you have no idea where it happened. > > This is more in line with my personal experience. Something gets > trashed in the heap and it is a fairly well defined mechanical process > to first script a reproducable sequence that triggers the event and > secondly script a debug sequence that watches for the corruption. > Typically a few well placed watchpoints and a very long lunch (while it > runs :-) does the trick. > > On the other hand applying such a technique to a stack corruption is far > more complicated. The very nature of a stack is that it is constantly, > and _annonymously_ being reused. Typically far more complex tracing > sequences are needed to capture the corruption in progress. I tend to agree with Andrew and against Chris in this regard. For malloc overruns, I typically just link in with -lefence or other tools, and it is fairly obvious where the bug is. In the past, for particularly nasty stack overruns, I have changed the compiler to emit special debug code for prologues and epilogues (of course nowadays, you can use -finstrument-functions to accomplish the same behavior). -- Michael Meissner, Red Hat, Inc. PMB 198, 174 Littleton Road #3, Westford, Massachusetts 01886, USA Work: meissner@redhat.com phone: +1 978-486-9304 Non-work: meissner@spectacle-pond.org fax: +1 978-692-4482 >From ac131313@cygnus.com Sun Nov 12 22:07:00 2000 From: Andrew Cagney To: GDB Discussion Subject: fork-child.c:97: warning: argument `exec_file' might be clobbered by `longjmp' or `vfork' Date: Sun, 12 Nov 2000 22:07:00 -0000 Message-id: <3A0F8357.8C01BE70@cygnus.com> X-SW-Source: 2000-11/msg00115.html Content-length: 936 Compiling GDB with: -Wimplicit -Wreturn-type -Wcomment -Wtrigraphs -Wformat -Wparentheses -Wpointer-arith -Wuninitialized -Werror gets the warnings: ../../src/gdb/fork-child.c: In function `fork_inferior': ../../src/gdb/fork-child.c:112: warning: `argv' might be used uninitialized in this function ../../src/gdb/fork-child.c:112: warning: variable `argv' might be clobbered by `longjmp' or `vfork' ../../src/gdb/fork-child.c:97: warning: argument `exec_file' might be clobbered by `longjmp' or `vfork' ../../src/gdb/fork-child.c:99: warning: argument `shell_file' might be clobbered by `longjmp' or `vfork' I can understand the argv warning but not the warnings about ``exec_file'' and ``shell_file''. Anyone care to explain what they are about? I've looked at the the fork_inferior() source, and beyond the vfork() call, I could only find rval (read-only) references to the ``exec_file'' and ``shell_file'' arguments. Andrew >From nsd@redhat.com Sun Nov 12 22:49:00 2000 From: Nick Duffek To: gdb@sources.redhat.com Subject: [RFA] alloca coding standard Date: Sun, 12 Nov 2000 22:49:00 -0000 Message-id: <200011130654.eAD6sjV16467@rtl.cygnus.com> References: <3A0CE1DB.6F6B1E93@cygnus.com> X-SW-Source: 2000-11/msg00116.html Content-length: 3672 On 11-Nov-2000, Andrew Cagney wrote: >To decide it is appropriate I think there need to be some clear and >objective guidelines. How about this? The appended patch: - Documents the de facto coding standard of allowing limited alloca() allocations. I've set the limit at 1 page per function; does anyone think it should be less, more, or specified in bytes instead of pages? - Briefly describes some of the issues that have been raised. - Moves an alloca (0) call from registers_changed() to wait_for_inferior(), so that the documentation can say: "For systems that use libiberty's alloca emulation, it is important that alloca be called when the stack is shallow to garbage-collect freed space. As of this writing, GDB calls alloc (0) once per user command and once per inferior wait." It seemed more puzzling to say "GDB calls alloca (0) once per register cache invalidation", since that happens at less-predictable stack depths. ChangeLog entries: * gdbint.texinfo (Alloca): New subsection. * infrun.c (wait_for_inferior): Call alloca (0). * regcache.c (registers_changed): Move alloca (0) call to wait_for_inferior() loop. Nick [patch follows] Index: gdb/doc/gdbint.texinfo =================================================================== diff -up gdb/doc/gdbint.texinfo gdb/doc/gdbint.texinfo --- gdb/doc/gdbint.texinfo Mon Nov 13 01:25:51 2000 +++ gdb/doc/gdbint.texinfo Mon Nov 13 01:25:15 2000 @@ -2854,6 +2854,24 @@ visible to random source files. All static functions must be declared in a block near the top of the source file. +@subsection Alloca + +@code{alloca} may be used for allocating up to a page of stack memory +per function call. + +The size restriction is a compromise between the benefits of +@code{alloca} and the risk of stack overflow, which can happen on +systems that have fixed-length stack regions or bounds on frame sizes. + +For systems that use libiberty's @code{alloca} emulation, it is +important that @code{alloca} be called when the stack is shallow to +garbage-collect freed space. As of this writing, @value{GDBN} calls +@code{alloc (0)} once per user command and once per inferior wait. + +Because @value{GDBN} and other GNU programs use @code{alloca}, they are +not portable to systems that neither provide a native @code{alloca} nor +support libiberty's emulation of @code{alloca}. + @subsection Clean Design In addition to getting the syntax right, there's the little question of Index: gdb/infrun.c =================================================================== diff -up gdb/infrun.c gdb/infrun.c --- gdb/infrun.c Mon Nov 13 01:25:58 2000 +++ gdb/infrun.c Mon Nov 13 01:23:55 2000 @@ -1282,6 +1282,9 @@ wait_for_inferior (void) while (1) { + /* Garbage-collect libiberty's alloca() emulation. */ + alloca (0); + if (target_wait_hook) ecs->pid = target_wait_hook (ecs->waiton_pid, ecs->wp); else Index: gdb/regcache.c =================================================================== diff -up gdb/regcache.c gdb/regcache.c --- gdb/regcache.c Mon Nov 13 01:26:00 2000 +++ gdb/regcache.c Mon Nov 13 01:24:18 2000 @@ -295,13 +295,6 @@ registers_changed (void) registers_pid = -1; - /* Force cleanup of any alloca areas if using C alloca instead of - a builtin alloca. This particular call is used to clean up - areas allocated by low level target code which may build up - during lengthy interactions between gdb and the target before - gdb gives control to the user (ie watchpoints). */ - alloca (0); - for (i = 0; i < ARCH_NUM_REGS; i++) register_valid[i] = 0; >From eliz@delorie.com Mon Nov 13 03:13:00 2000 From: Eli Zaretskii To: fnasser@cygnus.com Cc: cgf@redhat.com, meissner@cygnus.com, gdb@sources.redhat.com, cagney@cygnus.com Subject: Re: alloca is bad? Date: Mon, 13 Nov 2000 03:13:00 -0000 Message-id: <200011131112.GAA02182@indy.delorie.com> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> <200011120538.AAA01237@indy.delorie.com> <3A0E4F83.5F9D97FF@cygnus.com> <200011121216.HAA01483@indy.delorie.com> <3A0F24AA.50E22A01@cygnus.com> X-SW-Source: 2000-11/msg00117.html Content-length: 655 > Date: Sun, 12 Nov 2000 23:15:54 +0000 > From: Fernando Nasser > > As Andrew explained in his message, if the stack is > corrupted the PC and FP can (and probably will) be clobbered with > the garbage when the function returns. They could, yes; but in practice (at least in my experience), the clobbered return address is caught by the OS protection in most cases, so the program will GPF before the PC is garbled. > Bottom line: for most stack corruption problems, no "where" ("backtrace") In my experience, in most cases, there is in fact at least the frame where it crashed. You should be able to start debugging from there. >From eliz@delorie.com Mon Nov 13 03:14:00 2000 From: Eli Zaretskii To: meissner@cygnus.com Cc: ac131313@cygnus.com, fnasser@cygnus.com, cgf@redhat.com, meissner@cygnus.com, gdb@sources.redhat.com, cagney@cygnus.com Subject: Re: alloca is bad? Date: Mon, 13 Nov 2000 03:14:00 -0000 Message-id: <200011131113.GAA02185@indy.delorie.com> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> <200011120538.AAA01237@indy.delorie.com> <3A0E677B.EBF4AFCA@cygnus.com> <20001112200034.53469@cse.cygnus.com> X-SW-Source: 2000-11/msg00118.html Content-length: 1078 > From: Michael Meissner > Date: Sun, 12 Nov 2000 20:00:34 -0500 > > For malloc overruns, I typically just link in with -lefence or other > tools, and it is fairly obvious where the bug is. Except that there are programs where using Efence or similar tools is impractical, because these libraries typically don't free memory when the application calls `free' (to catch code which writes into free'd memory). This enlarges the memory footprint of such programs to the extent that you cannot link with -lefence, because the program won't run for more than a few moments, or not at all. I'd expect GDB to be one of the programs where this approach is not very practical, what with all the malloc'ing that goes on there. In fact, we have such an elusive memory problem right now in Emacs development sources. It's been a couple of months since it was first reported, and all of the Emacs developers have been hunting it ever since, including lots of special-purpose code that was written specifically for catching that bug. The bug is still at large... >From eliz@delorie.com Mon Nov 13 03:16:00 2000 From: Eli Zaretskii To: nsd@redhat.com Cc: gdb@sources.redhat.com Subject: Re: [RFA] alloca coding standard Date: Mon, 13 Nov 2000 03:16:00 -0000 Message-id: <200011131116.GAA02191@indy.delorie.com> References: <3A0CE1DB.6F6B1E93@cygnus.com> <200011130654.eAD6sjV16467@rtl.cygnus.com> X-SW-Source: 2000-11/msg00119.html Content-length: 1254 > Date: Mon, 13 Nov 2000 01:54:45 -0500 > From: Nick Duffek > > diff -up gdb/doc/gdbint.texinfo gdb/doc/gdbint.texinfo > --- gdb/doc/gdbint.texinfo Mon Nov 13 01:25:51 2000 > +++ gdb/doc/gdbint.texinfo Mon Nov 13 01:25:15 2000 > @@ -2854,6 +2854,24 @@ visible to random source files. I have a few minor comments (assuming that the idea is accepted and this text will be added to gdbint.texinfo): > +@subsection Alloca I suggest an index entry here, e.g. "@findex alloca usage". (Hmm, it looks like gdbint.texinfo doesn't print an index right now. I will fix that.) > +@code{alloca} may be used for allocating up to a page of stack memory > +per function call. I'd suggest to say how much is a "page", at least for a couple of popular architectures. Not everyone is privy to intimate details of system's memory allocation. > +garbage-collect freed space. As of this writing, @value{GDBN} calls > +@code{alloc (0)} once per user command and once per inferior wait. ^^^^^ This should be `alloca', not `alloc'. Finally, perhaps something should be said about functions that could potentially be called recursively, where care should be taken not to overflow the runtime stack, even if each invocation asks a single page. >From ac131313@cygnus.com Mon Nov 13 04:51:00 2000 From: Andrew Cagney To: Nick Duffek Cc: gdb@sources.redhat.com Subject: Re: [RFA] alloca coding standard Date: Mon, 13 Nov 2000 04:51:00 -0000 Message-id: <3A0FE1FC.BBA6C43F@cygnus.com> References: <3A0CE1DB.6F6B1E93@cygnus.com> <200011130654.eAD6sjV16467@rtl.cygnus.com> X-SW-Source: 2000-11/msg00120.html Content-length: 835 Nick Duffek wrote: > > On 11-Nov-2000, Andrew Cagney wrote: > > >To decide it is appropriate I think there need to be some clear and > >objective guidelines. > > How about this? The appended patch: > - Documents the de facto coding standard of allowing limited alloca() > allocations. I've set the limit at 1 page per function; does anyone > think it should be less, more, or specified in bytes instead of pages? I'd suggest first allowing for the discussion which is clarifying all the issues to settle down and only after that start looking at something recording guidelines for alloca. (I really don't see the urgency on this). Andrew > - Moves an alloca (0) call from registers_changed() to > wait_for_inferior(), so that the documentation can say: This is a separate problem and not part of the patch. >From fnasser@cygnus.com Mon Nov 13 07:13:00 2000 From: Fernando Nasser To: Nick Duffek Cc: gdb@sources.redhat.com Subject: Re: [RFA] alloca coding standard Date: Mon, 13 Nov 2000 07:13:00 -0000 Message-id: <3A1004F0.B00504A3@cygnus.com> References: <3A0CE1DB.6F6B1E93@cygnus.com> <200011130654.eAD6sjV16467@rtl.cygnus.com> X-SW-Source: 2000-11/msg00121.html Content-length: 771 Nick Duffek wrote: > > - Moves an alloca (0) call from registers_changed() to > wait_for_inferior(), so that the documentation can say: > > "For systems that use libiberty's alloca emulation, it is important > that alloca be called when the stack is shallow to garbage-collect > freed space. As of this writing, GDB calls alloc (0) once per user > command and once per inferior wait." > What user command? The GUI ad other script languages call random functions in GDB. And their command loops may even be in another process! You must also explain how alloca(0) will work in libgdb. -- Fernando Nasser Red Hat Canada Ltd. E-Mail: fnasser@redhat.com 2323 Yonge Street, Suite #300 Toronto, Ontario M4P 2C9 >From jimb@zwingli.cygnus.com Mon Nov 13 07:52:00 2000 From: Jim Blandy To: "Christopher G. Faylor" Cc: gdb@sources.redhat.com Subject: Re: alloca is bad? Date: Mon, 13 Nov 2000 07:52:00 -0000 Message-id: References: <20001109212032.A26464@redhat.com> <200011101039.FAA29788@indy.delorie.com> <20001110111651.A19503@redhat.com> <3A0CD8C1.426EA933@cygnus.com> <20001111130703.A24382@redhat.com> X-SW-Source: 2000-11/msg00122.html Content-length: 231 > I don't think that the fact that you can cut yourself with a knife means > that you eliminate all sharp objects from your premises. I wish you folks would be more straightforward. Now I have to go re-purchase all that cutlery. >From meissner@cygnus.com Mon Nov 13 08:05:00 2000 From: Michael Meissner To: Eli Zaretskii Cc: fnasser@cygnus.com, cgf@redhat.com, meissner@cygnus.com, gdb@sources.redhat.com, cagney@cygnus.com Subject: Re: alloca is bad? Date: Mon, 13 Nov 2000 08:05:00 -0000 Message-id: <20001113110546.22415@cse.cygnus.com> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> <200011120538.AAA01237@indy.delorie.com> <3A0E4F83.5F9D97FF@cygnus.com> <200011121216.HAA01483@indy.delorie.com> <3A0F24AA.50E22A01@cygnus.com> <200011131112.GAA02182@indy.delorie.com> X-SW-Source: 2000-11/msg00123.html Content-length: 1233 On Mon, Nov 13, 2000 at 06:12:58AM -0500, Eli Zaretskii wrote: > > Date: Sun, 12 Nov 2000 23:15:54 +0000 > > From: Fernando Nasser > > > > As Andrew explained in his message, if the stack is > > corrupted the PC and FP can (and probably will) be clobbered with > > the garbage when the function returns. > > They could, yes; but in practice (at least in my experience), the > clobbered return address is caught by the OS protection in most cases, > so the program will GPF before the PC is garbled. That's not my experience, but I suspect different machines, different OSes.... > > Bottom line: for most stack corruption problems, no "where" ("backtrace") > > In my experience, in most cases, there is in fact at least the frame > where it crashed. You should be able to start debugging from there. This is assuming you have a valid frame. On systems with a frame pointer, the FP often times gets clobbered, just like the return address does, because both get restored at the same time. -- Michael Meissner, Red Hat, Inc. PMB 198, 174 Littleton Road #3, Westford, Massachusetts 01886, USA Work: meissner@redhat.com phone: +1 978-486-9304 Non-work: meissner@spectacle-pond.org fax: +1 978-692-4482 >From meissner@cygnus.com Mon Nov 13 08:10:00 2000 From: Michael Meissner To: Eli Zaretskii Cc: meissner@cygnus.com, ac131313@cygnus.com, fnasser@cygnus.com, cgf@redhat.com, gdb@sources.redhat.com, cagney@cygnus.com Subject: Re: alloca is bad? Date: Mon, 13 Nov 2000 08:10:00 -0000 Message-id: <20001113111007.55424@cse.cygnus.com> References: <20001109212032.A26464@redhat.com> <20001109213750.28987@cse.cygnus.com> <20001109222231.A26675@redhat.com> <3A0DA348.6BDDAFD4@cygnus.com> <200011120538.AAA01237@indy.delorie.com> <3A0E677B.EBF4AFCA@cygnus.com> <20001112200034.53469@cse.cygnus.com> <200011131113.GAA02185@indy.delorie.com> X-SW-Source: 2000-11/msg00124.html Content-length: 1874 On Mon, Nov 13, 2000 at 06:13:49AM -0500, Eli Zaretskii wrote: > > From: Michael Meissner > > Date: Sun, 12 Nov 2000 20:00:34 -0500 > > > > For malloc overruns, I typically just link in with -lefence or other > > tools, and it is fairly obvious where the bug is. > > Except that there are programs where using Efence or similar tools is > impractical, because these libraries typically don't free memory when > the application calls `free' (to catch code which writes into free'd > memory). This enlarges the memory footprint of such programs to the > extent that you cannot link with -lefence, because the program won't > run for more than a few moments, or not at all. Yes I know, I assumed that would be obvious, particularly with efence which allocates 2 pages per malloc request. Its part of the reason why I have 1+ gigs of swap space on this machine, and will probably go to 2 or more the next time I reogranize the disks. > I'd expect GDB to be one of the programs where this approach is not > very practical, what with all the malloc'ing that goes on there. In terms of GCC, it tended to be obstacks that caused the most grief, because they didn't work well with malloc checkers. Fortunately, obstacks are mostly a thing of the past these days in the compiler. > In fact, we have such an elusive memory problem right now in Emacs > development sources. It's been a couple of months since it was first > reported, and all of the Emacs developers have been hunting it ever > since, including lots of special-purpose code that was written > specifically for catching that bug. The bug is still at large... Bummer.... -- Michael Meissner, Red Hat, Inc. PMB 198, 174 Littleton Road #3, Westford, Massachusetts 01886, USA Work: meissner@redhat.com phone: +1 978-486-9304 Non-work: meissner@spectacle-pond.org fax: +1 978-692-4482 >From cgf@cygnus.com Mon Nov 13 08:25:00 2000 From: "Christopher G. Faylor" To: Jim Blandy Cc: gdb@sources.redhat.com Subject: Re: alloca is bad? Date: Mon, 13 Nov 2000 08:25:00 -0000 Message-id: <20001113112515.C7424@redhat.com> References: <20001109212032.A26464@redhat.com> <200011101039.FAA29788@indy.delorie.com> <20001110111651.A19503@redhat.com> <3A0CD8C1.426EA933@cygnus.com> <20001111130703.A24382@redhat.com> X-SW-Source: 2000-11/msg00125.html Content-length: 570 On Mon, Nov 13, 2000 at 10:52:18AM -0500, Jim Blandy wrote: >>I don't think that the fact that you can cut yourself with a knife >>means that you eliminate all sharp objects from your premises. > >I wish you folks would be more straightforward. Now I have to go >re-purchase all that cutlery. It's not just cutlery. For a while, I actually had eliminated all electrical appliances after I saw the consequences of a lightening strike to a tree. Eventually, I decided that this was foolish, so I'm now allowing my family to use flashlights and transistor radios. cgf >From nsd@redhat.com Mon Nov 13 08:45:00 2000 From: Nick Duffek To: gdb@sources.redhat.com Subject: [RFA] Move alloca(0) to wait_for_inferior() from registers_changed() Date: Mon, 13 Nov 2000 08:45:00 -0000 Message-id: <200011131651.eADGp7F16742@rtl.cygnus.com> References: <3A0FE1FC.BBA6C43F@cygnus.com> X-SW-Source: 2000-11/msg00126.html Content-length: 2686 On 13-Nov-2000, Andrew Cagney wrote: >Nick Duffek wrote: >> - Moves an alloca (0) call from registers_changed() to >> wait_for_inferior(), so that the documentation can say: >This is a separate problem and not part of the patch. Okay. The appended patch moves an alloca (0) call from registers_changed() in regcache.c to wait_for_inferior() in infrun.c. The purpose of alloca (0) is to garbage-collect space freed by libiberty's alloca implementation. It is important that this garbage collection occur at regular intervals when the stack is shallow. At the moment, regcache.c has this comment preceding the alloca (0) call: /* Force cleanup of any alloca areas if using C alloca instead of a builtin alloca. This particular call is used to clean up areas allocated by low level target code which may build up during lengthy interactions between gdb and the target before gdb gives control to the user (ie watchpoints). */ It's not obvious that registers_changed() gets called regularly or at shallow stack depths during interactions between gdb and the target. But it is obvious that wait_for_inferior() loops regularly at shallow stack depths during those interactions. Therefore, it makes more sense to put the alloca (0) in the wait_for_inferior() loop than in registers_changed(). ChangeLog entries: * infrun.c (wait_for_inferior): Call alloca (0). * regcache.c (registers_changed): Move alloca (0) call to wait_for_inferior() loop. Okay to apply? Nick [patch follows] Index: gdb/infrun.c =================================================================== diff -up gdb/infrun.c gdb/infrun.c --- gdb/infrun.c Mon Nov 13 01:25:58 2000 +++ gdb/infrun.c Mon Nov 13 01:23:55 2000 @@ -1282,6 +1282,9 @@ wait_for_inferior (void) while (1) { + /* Garbage-collect libiberty's alloca() emulation. */ + alloca (0); + if (target_wait_hook) ecs->pid = target_wait_hook (ecs->waiton_pid, ecs->wp); else Index: gdb/regcache.c =================================================================== diff -up gdb/regcache.c gdb/regcache.c --- gdb/regcache.c Mon Nov 13 01:26:00 2000 +++ gdb/regcache.c Mon Nov 13 01:24:18 2000 @@ -295,13 +295,6 @@ registers_changed (void) registers_pid = -1; - /* Force cleanup of any alloca areas if using C alloca instead of - a builtin alloca. This particular call is used to clean up - areas allocated by low level target code which may build up - during lengthy interactions between gdb and the target before - gdb gives control to the user (ie watchpoints). */ - alloca (0); - for (i = 0; i < ARCH_NUM_REGS; i++) register_valid[i] = 0; >From fnasser@cygnus.com Mon Nov 13 09:08:00 2000 From: Fernando Nasser To: gdb@sources.redhat.com Subject: RFC: CLI files naming Date: Mon, 13 Nov 2000 09:08:00 -0000 Message-id: <3A101FD5.C38FD29D@cygnus.com> X-SW-Source: 2000-11/msg00127.html Content-length: 1144 As part of the separation of the command line interface code from the gdb core (libgdb), I will need to create files related to this interpreter. We have placed MI related files in the gdb/mi subdirectory, the TUI related files in the gdb/tui and the Insight files in the gdb/gdbtk one. Of course, CLI files should go into the gdb/cli subdir. However, for historical reasons, the MI and Insight files got prefixes in their names. The reason was that they previously resided in the gdb directory itself and this was a way to group them together. So, do I follow the tradition and name files like: gdb/cli/cli-cmds.c or do I eliminate the redundancy and go for shorter names like: gdb/cli/cmds.c Note that the long names have an advantage when debugging. The MI, for instance, have a file called mi-main.c. If it was just main.c (of course we could choose another name) it would clash with the gdb/main.c when we tried to set a breakpoint at main.c:36 for instance. Any preferences? -- Fernando Nasser Red Hat Canada Ltd. E-Mail: fnasser@redhat.com 2323 Yonge Street, Suite #300 Toronto, Ontario M4P 2C9 >From larry@smith-house.org Mon Nov 13 09:31:00 2000 From: Larry Smith To: gdb@sourceware.cygnus.com Subject: Re: extending Gdb to display app specific data Date: Mon, 13 Nov 2000 09:31:00 -0000 Message-id: <3A10244D.8689F2C1@smith-house.org> References: <3A0B4885.B2384AE7@netscape.com> <200011101039.FAA29785@indy.delorie.com> <3A0C2791.A33BE9AC@ix.netcom.com> <200011102009.PAA00126@indy.delorie.com> <3A0C8E19.A8A6D355@netscape.com> <20001110191444.26019@cse.cygnus.com> X-SW-Source: 2000-11/msg00128.html Content-length: 504 How tough would it be to simply bolt in a C language interpreter so you could define functions at debug time or by putting them in your .gdbtkrc file or something? cint, or EiC would go a long way toward providing these same features. -- .-. .-. .---. .---. .-..-. | "Bill Gates is just a monocle | |__ / | \| |-< | |-< > / | and a Persian Cat away from `----'`-^-'`-'`-'`-'`-' `-' | being one of the bad guys in a My opinions only. | James Bond movie." -- D Miller >From nsd@redhat.com Mon Nov 13 09:32:00 2000 From: Nick Duffek To: gdb@sources.redhat.com Subject: Re: [RFA] Move alloca(0) to wait_for_inferior() from registers_changed() Date: Mon, 13 Nov 2000 09:32:00 -0000 Message-id: <200011131738.eADHcIQ16791@rtl.cygnus.com> References: <200011131651.eADGp7F16742@rtl.cygnus.com> X-SW-Source: 2000-11/msg00129.html Content-length: 219 On 13-Nov-2000, I wrote: >The appended patch moves an alloca (0) call from registers_changed() in >regcache.c to wait_for_inferior() in infrun.c. Whoops, I should have sent that to gdb-patches. It's there now. Nick >From nsd@redhat.com Mon Nov 13 12:14:00 2000 From: Nick Duffek To: gdb@sources.redhat.com Subject: Re: alloca is bad? Date: Mon, 13 Nov 2000 12:14:00 -0000 Message-id: <200011132020.eADKKNZ16954@rtl.cygnus.com> References: <20001112200034.53469@cse.cygnus.com> X-SW-Source: 2000-11/msg00130.html Content-length: 286 On 12-Nov-2000, Michael Meissner wrote: >For malloc overruns, I typically just link in with -lefence or other >tools, and it is fairly obvious where the bug is. For alloca overruns, one could get similar results by recompiling with libiberty's alloca and linking with -lefence. Nick >From meissner@cygnus.com Mon Nov 13 12:21:00 2000 From: Michael Meissner To: Nick Duffek Cc: gdb@sources.redhat.com Subject: Re: alloca is bad? Date: Mon, 13 Nov 2000 12:21:00 -0000 Message-id: <20001113152141.07868@cse.cygnus.com> References: <20001112200034.53469@cse.cygnus.com> <200011132020.eADKKNZ16954@rtl.cygnus.com> X-SW-Source: 2000-11/msg00131.html Content-length: 820 On Mon, Nov 13, 2000 at 03:20:23PM -0500, Nick Duffek wrote: > On 12-Nov-2000, Michael Meissner wrote: > > >For malloc overruns, I typically just link in with -lefence or other > >tools, and it is fairly obvious where the bug is. > > For alloca overruns, one could get similar results by recompiling with > libiberty's alloca and linking with -lefence. That won't work, since alloca is a builtin provided by GCC. If GCC is detected as the compiler at configuration time, then alloca is not even built into libiberty. On systems like Linux and *BSD, there is typically no other compiler except GCC. -- Michael Meissner, Red Hat, Inc. PMB 198, 174 Littleton Road #3, Westford, Massachusetts 01886, USA Work: meissner@redhat.com phone: +1 978-486-9304 Non-work: meissner@spectacle-pond.org fax: +1 978-692-4482 >From nsd@redhat.com Mon Nov 13 12:31:00 2000 From: Nick Duffek To: gdb@sources.redhat.com Subject: Re: [RFA] alloca coding standard Date: Mon, 13 Nov 2000 12:31:00 -0000 Message-id: <200011132036.eADKaev16965@rtl.cygnus.com> References: <200011130654.eAD6sjV16467@rtl.cygnus.com> X-SW-Source: 2000-11/msg00132.html Content-length: 2901 Thanks for the comments. Here's an updated patch that I hope addresses the concerns raised. In particular: On 13-Nov-2000, Eli Zaretskii wrote: >I suggest an index entry here, e.g. "@findex alloca usage". Agreed. I also added a @cindex entry. >I'd suggest to say how much is a "page", at least for a couple of >popular architectures. I changed it to "1 page or 512 bytes, whichever is larger" so that developers needn't know the page size. >This should be `alloca', not `alloc'. Whoops, thanks. >Finally, perhaps something should be said about functions that could >potentially be called recursively I added a cautionary sentence about recursive functions. On 13-Nov-2000, Andrew Cagney wrote: >I'd suggest first allowing for the discussion which is clarifying all >the issues to settle down and only after that start looking at something >recording guidelines for alloca. After the discussion settles down, I'll resubmit the patch if it still seems applicable. On 13-Nov-2000, Fernando Nasser wrote: >What user command? The GUI ad other script languages call random functions >in GDB. And their command loops may even be in another process! Good point. I reworded the documentation to cover more interfaces and to be less specific about where they call alloca (0). Nick [patch follows] Index: gdb/doc/gdbint.texinfo =================================================================== diff -up gdb/doc/gdbint.texinfo gdb/doc/gdbint.texinfo --- gdb/doc/gdbint.texinfo Mon Nov 13 15:08:57 2000 +++ gdb/doc/gdbint.texinfo Mon Nov 13 15:08:06 2000 @@ -2854,6 +2854,31 @@ visible to random source files. All static functions must be declared in a block near the top of the source file. +@subsection Alloca +@findex alloca +@cindex alloca + +@code{alloca} may be used for allocating up to one page or 512 bytes of +memory per function call, whichever is larger. + +The size limit is a compromise between the benefits of @code{alloca} and +the risk of stack overflow, which can happen on systems that have +fixed-length stack regions or bounds on frame sizes. + +In deeply recursive functions, a smaller limit may be necessary to avoid +overflowing the stack. + +For systems that use libiberty's @code{alloca} emulation, it is +important that @code{alloca} be called when the stack is shallow to +garbage-collect freed space. Toward that end, each interface to the +@value{GDBN} core --- for example, GDBTK, libgdb, and the text console +--- should ensure that @code{alloca (0)} is called periodically. In +addition, @value{GDBN} calls @code{alloca (0)} once per inferior wait. + +Because @value{GDBN} and other GNU programs use @code{alloca}, they are +not portable to systems that neither provide a native @code{alloca} nor +support libiberty's emulation of @code{alloca}. + @subsection Clean Design In addition to getting the syntax right, there's the little question of >From jtc@redback.com Mon Nov 13 12:34:00 2000 From: jtc@redback.com (J.T. Conklin) To: cgf@cygnus.com Cc: gdb@sourceware.cygnus.com Subject: wince, dcache, and memory region attributes Date: Mon, 13 Nov 2000 12:34:00 -0000 Message-id: <5msnov4nnm.fsf@jtc.redback.com> X-SW-Source: 2000-11/msg00133.html Content-length: 1050 The dcache is usually disabled when GDB starts, but the wince target enables it with a call to set_dcache_state(1). I am trying to get a bit of backround about why the wince target does this. In my experience with other embedded systems/OSs, a GDB user generally wouldn't want GDB caching memory because of memory mapped I/O devices. Is there something about Win-CE or the typical Win-CE developer that is different? The reason I ask is that after the memory region attribute code is committed, a GDB user will be able to define regions of memory and specify the cache / nocache attribute separately for each one. But when a user does not define regions, or accesses are made to memory that is not within a defined region, the "default" set of attributes are used. At present, the default attributes has the nocache set. Will this make things difficult for Win-CE users? I'd rather not provide knobs that change the default memory attributes depending on what targets are linked into GDB. --jtc -- J.T. Conklin RedBack Networks >From kevinb@cygnus.com Mon Nov 13 12:58:00 2000 From: Kevin Buettner To: gdb@sources.redhat.com Subject: Re: [RFA] alloca coding standard Date: Mon, 13 Nov 2000 12:58:00 -0000 Message-id: <1001113205812.ZM8470@ocotillo.lan> References: <200011130654.eAD6sjV16467@rtl.cygnus.com> <200011132036.eADKaev16965@rtl.cygnus.com> X-SW-Source: 2000-11/msg00134.html Content-length: 692 On Nov 13, 3:36pm, Nick Duffek wrote: > >I'd suggest to say how much is a "page", at least for a couple of > >popular architectures. > > I changed it to "1 page or 512 bytes, whichever is larger" so that > developers needn't know the page size. I think we ought to just pick a number (in bytes) and not mention pages at all. The last time I checked, the default page size on Linux/IA-64 was 16KB. It's possible to recompile the kernel to use 64KB pages. My point is that it would be possible for someone to submit patch using alloca() which meets your proposed guidelines (on the architecture upon which the patch was developed), but which aren't reasonable for generic code. Kevin >From nsd@redhat.com Mon Nov 13 13:19:00 2000 From: Nick Duffek To: kevinb@cygnus.com Cc: gdb@sources.redhat.com Subject: Re: [RFA] alloca coding standard Date: Mon, 13 Nov 2000 13:19:00 -0000 Message-id: <200011132124.eADLOnc17012@rtl.cygnus.com> References: <1001113205812.ZM8470@ocotillo.lan> X-SW-Source: 2000-11/msg00135.html Content-length: 234 On 13-Nov-2000, Kevin Buettner wrote: >I think we ought to just pick a number (in bytes) and not mention >pages at all. I agree. I'll rework the patch to specify 512 bytes or whatever number the group thinks is reasonable. Nick >From nsd@redhat.com Mon Nov 13 13:24:00 2000 From: Nick Duffek To: meissner@cygnus.com Cc: gdb@sources.redhat.com Subject: Re: alloca is bad? Date: Mon, 13 Nov 2000 13:24:00 -0000 Message-id: <200011132129.eADLTfJ17017@rtl.cygnus.com> References: <20001113152141.07868@cse.cygnus.com> X-SW-Source: 2000-11/msg00136.html Content-length: 454 On 13-Nov-2000, Michael Meissner wrote: >That won't work, since alloca is a builtin provided by GCC. That can be disabled by compiling with -fno-builtin. >If GCC is detected as the compiler at configuration time, then alloca is >not even built into libiberty. True, but that's easy to circumvent. Using -lefence or other debugging mallocs to debug alloca overruns isn't as easy as a simple relink, but it doesn't seem fundamentally difficult. Nick >From cgf@cygnus.com Mon Nov 13 13:36:00 2000 From: Chris Faylor To: "J.T. Conklin" Cc: gdb@sources.redhat.com Subject: Re: wince, dcache, and memory region attributes Date: Mon, 13 Nov 2000 13:36:00 -0000 Message-id: <20001113163559.A9623@redhat.com> References: <5msnov4nnm.fsf@jtc.redback.com> X-SW-Source: 2000-11/msg00137.html Content-length: 2406 Hi, On Mon, Nov 13, 2000 at 12:34:05PM -0800, J.T. Conklin wrote: >The dcache is usually disabled when GDB starts, but the wince target >enables it with a call to set_dcache_state(1). > >I am trying to get a bit of backround about why the wince target does >this. In my experience with other embedded systems/OSs, a GDB user >generally wouldn't want GDB caching memory because of memory mapped >I/O devices. Is there something about Win-CE or the typical Win-CE >developer that is different? It was primarily for speed considerations. It can be turned off without affect functionality, obviously. Windows CE is essentially a stripped down version of Windows. My port of gdb to use a Windows CE device just allows you to debug applications on the device. It doesn't allow debugging of the OS itself where considerations like memory mapped I/O might be an issue. However, memory mapped file I/O is available to Windows CE applications, so maybe there is an issue there. I have no idea how common it would be to use this on Windows CE. >The reason I ask is that after the memory region attribute code is >committed, a GDB user will be able to define regions of memory and >specify the cache / nocache attribute separately for each one. But >when a user does not define regions, or accesses are made to memory >that is not within a defined region, the "default" set of attributes >are used. At present, the default attributes has the nocache set. > >Will this make things difficult for Win-CE users? I'd rather not >provide knobs that change the default memory attributes depending on >what targets are linked into GDB. I just turned on caching to ameliorate the inordinate lag engendered when communicating with a Windows CE device using Microsoft's protocols. There are optimizations that could be made to the way that gdb communicates with the hand-held device that could speed things up. So, if it isn't possible to keep things the way they are now, then turning off caching will only have a minor, correctable effect. I can offer many helpful suggestions to anyone who wants to speed things up. :-) I'm actually not aware of anyone actually using gdb to debug Windows CE applications. It would be sort of interesting to see if they started showing up here, complaining, if things slowed down... So, please just do whatever makes sense for gdb and don't worry about Windows CE. cgf >From Fabrice_Gautier@sdesigns.com Mon Nov 13 14:58:00 2000 From: Fabrice Gautier To: "GDB (E-mail)" Subject: Load additionnal symbols Date: Mon, 13 Nov 2000 14:58:00 -0000 Message-id: X-SW-Source: 2000-11/msg00138.html Content-length: 388 Hi, Is there a way to load additional symbol with gdb? My situation is that I have an eCos program loaded with the RedBoot monitor. When I debug i have my program symbols but i would like to addd the monitor symbols to gdb knowledge. If i use the symbol command the original table is deleted. I would like to have both table. Thanks. -- Fabrice Gautier fabrice_gautier@sdesigns.com