From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15679 invoked by alias); 24 Oct 2007 22:33:36 -0000 Received: (qmail 15666 invoked by uid 22791); 24 Oct 2007 22:33:35 -0000 X-Spam-Check-By: sourceware.org Received: from waste.org (HELO waste.org) (66.93.16.53) by sourceware.org (qpsmtpd/0.31) with ESMTP; Wed, 24 Oct 2007 22:33:31 +0000 Received: from waste.org (localhost [127.0.0.1]) by waste.org (8.13.8/8.13.8/Debian-3) with ESMTP id l9OMWrLR007796 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 24 Oct 2007 17:32:53 -0500 Received: (from oxymoron@localhost) by waste.org (8.13.8/8.13.8/Submit) id l9OMWoEx007792; Wed, 24 Oct 2007 17:32:50 -0500 Date: Wed, 24 Oct 2007 22:33:00 -0000 From: Matt Mackall To: Grant Likely Cc: linuxppc-embedded@ozlabs.org, gdb@sourceware.org Subject: Re: Apparent kernel bug with GDB on ppc405 Message-ID: <20071024223250.GI19691@waste.org> References: <20071024194640.GB19691@waste.org> <20071024204215.GC19691@waste.org> <20071024215421.GF19691@waste.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.13 (2006-08-11) X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2007-10/txt/msg00219.txt.bz2 On Wed, Oct 24, 2007 at 04:27:52PM -0600, Grant Likely wrote: > On 10/24/07, Matt Mackall wrote: > > On Wed, Oct 24, 2007 at 03:42:16PM -0500, Matt Mackall wrote: > > > On Wed, Oct 24, 2007 at 02:28:14PM -0600, Grant Likely wrote: > > > > On 10/24/07, Matt Mackall wrote: > > > > > I'm trying to debug a trivial statically-linked hello world program on > > > > > a Xilinx PPC 405 and I'm seeing the following behavior: > > > > > > > > > > > > > > > > > > > Any suggestions? > > > > > > > > http://thread.gmane.org/gmane.linux.ports.ppc.embedded/11202 > > > > > > > > I was fighting with a similar problem almost 2 years ago. Looks like > > > > it might be related. At some point the problem seemed to go away and > > > > I determined what the root cause was. :-( > > > > > > > > I haven't been using gdb lately, so I don't know if it's the same > > > > problem. Nobody I had talked to had seen the issue on other 405 > > > > platforms. It could very well be something virtex-specific. > > > > > > Could be the same problem, but I'm seeing only your symptom 3 so far. > > > > > > I've tried throwing some larger hammers at the problem. Flushing all > > > of the dcache and icache (flush_dcache_all and > > > flush_instruction_cache) isn't helping. But printk(".") does! > > > > Well there was one remaining cache - the TLB. This patch seems to make > > things work, but don't ask me why: > > > > --- include/asm-ppc/cacheflush.h (revision 10439) > > +++ include/asm-ppc/cacheflush.h (working copy) > > @@ -11,6 +11,7 @@ > > #define _PPC_CACHEFLUSH_H > > > > #include > > +#include > > > > /* > > * No cache flushing is required when address mappings are > > @@ -35,10 +36,23 @@ > > extern void flush_icache_user_range(struct vm_area_struct *vma, > > struct page *page, unsigned long addr, int len); > > > > #define copy_to_user_page(vma, page, vaddr, dst, src, len) \ > > do { memcpy(dst, src, len); \ > > flush_icache_user_range(vma, page, vaddr, len); \ > > + _tlbia(); \ > > } while (0) > > Hmmm; thinking out loud here... > > - so tlbia invalidates all TLB entries > - When gdb inserts a breakpoint the .text pages are marked as read > only, so the kernel does a copy on write so that gdb can modify the > instruction. The kernel also updates the page tables so that the test > process now uses the new page. > - This means that there are now 2 pages for that one section of > executable code; the original and the one with the breakpoint. > - However, the program is still in memory, and there is probably > already a TLB entry pointing to the original page for that range of > addresses. > > Could it be that the kernel page tables are getting updated to the new > page; but active set of TLB entries is not getting updated? > > If so, then printk(".") probably solves the problem simply because it > touches enough pages in its execution path that the old TLB entry gets > overwritten? There are only 64 TLB entries afterall. > > Thoughts? Not completely implausible, but a) why isn't this seen on basically every machine with software TLB? b) why does -local- GDB, which is presumably doing much less work than gdbserver + network stack, not fail? -- Mathematics is the supreme nostalgia of our time.