From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4652 invoked by alias); 6 Dec 2002 19:31:18 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 4634 invoked from network); 6 Dec 2002 19:31:17 -0000 Received: from unknown (HELO pc2.dolda2000.com) (217.215.27.171) by sources.redhat.com with SMTP; 6 Dec 2002 19:31:17 -0000 Received: from [192.168.0.154] ([192.168.0.154]) by pc2.dolda2000.com (8.11.6/8.11.2) with ESMTP id gB6JUwN32312; Fri, 6 Dec 2002 20:31:06 +0100 Subject: Re: Checking function calls From: Fredrik Tolf To: Michael Elizabeth Chastain Cc: gdb@sources.redhat.com In-Reply-To: <200212061708.gB6H8BS01208@duracef.shout.net> References: <200212061708.gB6H8BS01208@duracef.shout.net> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Fri, 06 Dec 2002 11:31:00 -0000 Message-Id: <1039203069.16508.18.camel@pc7> Mime-Version: 1.0 X-SW-Source: 2002-12/txt/msg00125.txt.bz2 On Fri, 2002-12-06 at 18:08, Michael Elizabeth Chastain wrote: > > I know, I didn't plan ahead good enough when I started writing it, and > > now I'm stuck with either this, or a large rewrite. > > When I run into this kind of problem, I like to step back -- way back -- > get away from computers for a day or two and think about it. > > I think there is no easy way out, that you actually are stuck with a > large rewrite. There are just too many pthread_mutex_lock's flying > around. I'm beginning to believe that, too. Maybe I have just been too optimistic. > > For instance: > > client.c:findtransfer() does not have any locks. > > in client.c:freesharecache(), there is code: > > if (cache->parent != NULL) > { > pthread_mutex_lock(&cache->parent->mutex)l; > ... > } > > in general, it's unsafe to test a member and then acquire the lock, > because someone else can delete cache->parent between the "if" statement > and the acquisition of the lock. > Here, however, that isn't possible, since all deletions from that list go via the freesharecache function, and a deletion of the parent also loops through, locks, and deletes all the children, and since one of the children apparently is locked, it won't go any further. I suspect it might deadlock it, though. > I recommend finding a textbook on multi-threaded programming that covers > "how to write thread-safe lists". From your package, it looks like > you are in it to learn, so you could step way back from the code and > learn some theory at this point. Yeah, when I began writing this program, I did not have much experience in multithreading. That's the reason that there are much too few mutexes in the program. Still, I don't think that's the reason for this bug. The loop in which it crashes in quite thread-safe. > Another alternative is to use one big mutex for the whole list. That is precisely what I have been wanting to implement for a long time. It's only that it would require an enormous rewrite to implement everywhere that it should be used. > The drawback is that walking the list locks the whole list against > addition and deletion. If your list walker is just "print status > information" then that is fine. If your list walker does some > long-lived network operation at each node then it is not fine. I have, however, made sure that doesn't happen by only using nonblocking I/O. Once again, though, I don't think that thread-unsafeness is the reason for this bug to happen. But I've added checks to that loop now, so I should discover it sooner or later. Thank you very much for all your help.