From: Fredrik Tolf <fredrik@dolda2000.cjb.net>
To: gdb@sources.redhat.com
Subject: Re: Checking function calls
Date: Thu, 05 Dec 2002 15:14:00 -0000 [thread overview]
Message-ID: <1039130097.2343.52.camel@pc7> (raw)
In-Reply-To: <200212050451.gB54ppB31800@duracef.shout.net>
On Thu, 2002-12-05 at 05:51, Michael Elizabeth Chastain wrote:
> Hi Fredrik,
>
> I'm throwing out a bunch of ideas here, take whatever looks useful
> and discard the rest.
>
> > Therefore, the failure has to be that a called
> > function doesn't restore EBX correctly, on rare occasions, right?
>
> I have seen this happen in a mixed programming environment,
> with a Cygwin program that used a Windows DLL. The Windows DLL
> had subtly different calling conventions where it did not preserve
> %ebx, %esi, and %edi across function calls. Perhaps you have some
> kind of third party library in your program which has a similar
> compatibilty issue?
The only libraries are libc, libpthread, libdl and libpam. In the
affected function, only libc and libpthread are used. Therefore I don't
think it's calling convention incompatibility, unless they in turn call
functions in third party libraries, which I find very unlikely.
>
> > My question is thus: Is there any way of debugging this with GDB? Can I
> > make GDB check that EBX is the same before and after every function call
> > from that frame in this thread to isolate the failing function? The
> > frame never exits (until the program exits, that is), if that helps.
>
> You could set a bunch of conditional breakpoints with "break if %ebx !=
> saved_ebx", where you add code to your program to initialize saved_ebx.
> Or you could say "break if %ebx < 0x1000" or some convenient constant.
>
That would, of course, be a good thing. It's only that I'd have to do
that after every single function call... That would take some time.
Maybe I'll do it, anyway. I was actually thinking of doing something
like that, but with code instead, and making the thread SIGSTOP itself
when EBX is invalid.
> You could also try forcing your variable to be on the stack instead of a
> register. Remove the "register" attribute from the declaration of "next"
> if you have one. Then add a "do_nothing(&next)" call to your function,
> to force "next" to be on the stack instead of in a register. If the
> symptoms go away then it's more likely to really be a register clobber.
That just doesn't feel like a very elegant solution, though. And, this
bug does actually surface in another function as well, only even more
seldomly. There it also affects a variable stored in EBX, but it gets
set to 0 instead. So, I would prefer actually solving the problem, so
that it doesn't show up anywhere else. I have noticed no similarities
between the two places where the bugs shows itself.
> > At first I was expecting that another thread somehow gets there and
> > modifies the storage memory of next.
>
> I still suspect this. It's more likely that memory gets clobbered rather
> than a register value.
>
But next isn't stored in memory at any place, so it cannot be that.
> Perhaps you need a function that locks the whole list and walks it for
> a sanity check, without deleting anything?
>
I always check the list with gdb when the program crashes, and it's
always correct. That's why I think that it's impossible that next is
loaded when the list is in an unstable state. The only times I actually
set the next element, I always set it to NULL or a pointer returned by
malloc(). If the list was to be made unstable by a buggy function
somewhere, it would have to restored again by the same function (since
it's always consistent when I look at it), and I just don't see that
happening.
> Here is another wild lead: if, somehow, a block gets freed and then
> you read it, many implementations of malloc keep housekeeping information
> in the first word or two of a freed block. That would explain why the
> value is always 0x10 to 0x30 (that could be block size, especially if it is
> rounded up to a multiple of 4 or 8) and why only 1-2 words are clobbered.
> If you manage your blocks with malloc/free, you could try turning on any
> malloc debugging facilities that you have.
I also suspected that something like that might happen, and therefore I
lock the elements one element ahead of the block I'm currently looking
at, so that the current block and the next are always locked. That's why
I have:
for(cur = list; cur != NULL; cur = next)
{
if((next = cur->next) != NULL)
pthread_mutex_lock(&next->mutex);
...
}
That is also a reason why the next variable has to be clobbered at some
later point, since pthread_mutex_lock succeeds on it. The program always
on the line "if((next = cur->next) != NULL)", since it segfaults when it
looks up cur->next, i.e. at that point cur has been set to the invalid
next as directed by the loop. Therefore, when the program crashes, next
and cur are equal, and I cannot see what element it was at before.
By the way, if you want to look at the code, it's available at
http://sourceforge.net/projects/dcprod/. I don't know if it's the latest
version, though.
next prev parent reply other threads:[~2002-12-05 23:14 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-12-04 20:51 Michael Elizabeth Chastain
2002-12-05 15:14 ` Fredrik Tolf [this message]
-- strict thread matches above, loose matches on Subject: below --
2002-12-06 9:08 Michael Elizabeth Chastain
2002-12-06 11:31 ` Fredrik Tolf
[not found] <200212052240.gB5Mefm16249@duracef.shout.net>
2002-12-06 8:24 ` Fredrik Tolf
2002-12-04 18:02 Fredrik Tolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1039130097.2343.52.camel@pc7 \
--to=fredrik@dolda2000.cjb.net \
--cc=gdb@sources.redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox