From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10230 invoked by alias); 26 Aug 2009 20:08:55 -0000 Received: (qmail 10222 invoked by uid 22791); 26 Aug 2009 20:08:54 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (65.74.133.4) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 26 Aug 2009 20:08:49 +0000 Received: (qmail 21558 invoked from network); 26 Aug 2009 20:08:47 -0000 Received: from unknown (HELO orlando.local) (pedro@127.0.0.2) by mail.codesourcery.com with ESMTPA; 26 Aug 2009 20:08:47 -0000 From: Pedro Alves To: gdb-patches@sourceware.org Subject: Re: [RFA] Use data cache for stack accesses Date: Wed, 26 Aug 2009 22:45:00 -0000 User-Agent: KMail/1.9.10 Cc: Doug Evans References: <7e6c8d660907081308r13bff580rdcf4822c77df8403@mail.gmail.com> <200908251944.45977.pedro@codesourcery.com> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200908262108.49085.pedro@codesourcery.com> X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2009-08/txt/msg00461.txt.bz2 On Wednesday 26 August 2009 17:29:40, Doug Evans wrote: > On Tue, Aug 25, 2009 at 11:44 AM, Pedro Alves wro= te: > > > I worry about new stale cache issues in non-stop mode. > > [...] > > It appears that (at least in non-stop or if any thread is running) > > the cache should only be live for the duration of an "high level > > operation" --- that is, for a "backtrace", or a "print", etc. > > Did you consider this? >=20 > It wasn't clear how to handle non-stop/etc. mode so I left that for > the next iteration. > If only having the data live across a high level operation works for > you, it works for me. Well, I'm not sure either, but much better to discuss it upfront than to introduce subtle, hard to reproduce bugs induced by GDB itself. Reading things from memory while threads are running is always racy --- if we want to get an accurate snapshot of the inferior's memory, we *have* to stop all threads temporarily. If we don't (stop all threads), then, even if the dcache gets stale while we do a series of micro memory reads (that logically are part of a bigger higher level operation --- extracting a backtrace, reading a structure from memory, etc.), it's OK, we can just pretend the memory had changed only after we did the reads instead of before. If we restart the higher level operation, we shouldn't not hit the stale cache, and that seems good enough. Writes are more dangerous though, since the dcache writes back cache lines in chunks, meaning, there's the risk that the dcache undoes changes the inferior had done meanwhile to other parts of the cache line higher layers of gdb code were not trying to write to (it's effectively a read-modify-write, hence, racy). This is a more general non-stop mode problem, not strictly related to the dcache, e.g., ptrace writes to memory do a word-sized read-modify-write, clearly a smaller risk than a 64-byte wide cache line, which is itself small, but still. The only cure for this is to stop all threads momentarily... We're shifting the caching decision to the intent of the transfer, but, we still cache whole lines (larger than the read_stack request) --- Do we still have rare border cases where the cache line can cover more than stack memory, hence still leading to incorrect results? Probably a very rare problem in practice. Even less problematic if the cache is only live across high level operations (essentially getting rid of most of the volatile memory problem). Which brings me to what amounts to chunking vs caching improvements you are seeing: > > Did you post number showing off the improvements from > > having the cache on? =A0E.g., when doing foo, with cache off, > > I get NNN memory reads, while with cache off, we get only > > nnn reads. =A0I'd be curious to have some backing behind > > "This improves remote performance significantly". >=20 > For a typical gdb/gdbserver connection here a backtrace of 256 levels > went from 48 seconds (average over 6 tries) to 4 seconds (average over > 6 tries). Nice! Were all those single runs started from cold cache, or are you starting from a cold cache and issuing 6 backtraces in a row? I mean, how sparse were those 6 tries? Shall one read that as 48,48,48,48,48,48 vs 20,1,1,1,1,1 (some improvement due to chunking, and large improvement due to caching in following repeats of the command); or 48,48,48,48,48,48 vs 4,4,4,4,4,4 (large improvement due to chunking --- caching not actually measured)? --=20 Pedro Alves