From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-28947-listarch-gdb=sources.redhat.com@sourceware.org>
Received: (qmail 29589 invoked by alias); 26 Jun 2007 17:22:31 -0000
Received: (qmail 29578 invoked by uid 22791); 26 Jun 2007 17:22:29 -0000
X-Spam-Check-By: sourceware.org
Received: from shell4.bayarea.net (HELO shell4.bayarea.net) (209.128.82.1)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 26 Jun 2007 17:22:27 +0000
Received: (qmail 8175 invoked from network); 26 Jun 2007 10:22:23 -0700
Received: from 209-128-106-254.bayarea.net (HELO ?192.168.20.7?) (209.128.106.254)   by shell4.bayarea.net with SMTP; 26 Jun 2007 10:22:23 -0700
Message-ID: <46814B4C.7080302@eagercon.com>
Date: Tue, 26 Jun 2007 17:22:00 -0000
From: Michael Eager <eager@eagercon.com>
User-Agent: Thunderbird 1.5.0.9 (X11/20070102)
MIME-Version: 1.0
To: Jim Blandy <jimb@codesourcery.com>
CC:  gdb@sources.redhat.com
Subject: Re: Non-uniform address spaces
References: <467D4AE3.7020505@eagercon.com>	<20070623212557.GB3448@caradoc.them.org>	<467D9503.9060804@eagercon.com> <m33b0gsz9s.fsf@codesourcery.com>	<46800482.4020700@eagercon.com> <m3zm2n6ejt.fsf@codesourcery.com>	<46801FDD.4020408@eagercon.com> <m3ejjz65co.fsf@codesourcery.com>	<468047F0.7060207@eagercon.com> <m3y7i6hcxt.fsf@codesourcery.com>
In-Reply-To: <m3y7i6hcxt.fsf@codesourcery.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
Mailing-List: contact gdb-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb.sourceware.org>
List-Subscribe: <mailto:gdb-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-owner@sourceware.org
X-SW-Source: 2007-06/txt/msg00305.txt.bz2

Jim Blandy wrote:

>> The compiler certainly can identify that an array or other data
>> is shared, to use UPC's terminology.  From there, the target code
>> would need to perform some magic to figure out where the address
>> actually pointed to.
> 
> Certainly, an ABI informs the interpretation of the debugging info.
> Do you have specific ideas yet on how to convey this information?

A hook (specified in gdb_arch) would specify a target routine
to do the translation.   When GDB sees a shared pointer, it
will call this target-dependent translation routine.

What isn't clear to me is where to call the hook.  Suggestions
about where to look would be welcome.

>> There are other places where an address is incremented, such as
>> in displaying memory contents.  I doubt that the code knows
>> what what it is displaying, only to display n words starting at
>> x address in z format.  This would probably result in incorrect
>> results if the data spanned from one processor/thread to another.
>> (At least at a first approximation, this may well be an acceptable
>> restriction.)
> 
> Certainly code for printing distributed objects will need to
> understand how to traverse them properly; I see this as parallel to
> the indexing/pointer arithmetic requirements.  Hopefully we can design
> one interface that serves both purposes nicely.

Perhaps.  I haven't looked in this code for a long time, but
my impression is that knowledge about what is being printed
gets discarded pretty early, leaving only a pointer, a count,
and a format.

>> Symtab code would need a hook which converted the ELF
>> <section,offset> into a <processor,thread,offset> for shared
>> objects.  Again, that would require target-dependent magic.
> 
> Hmm.  GDB's internal representation for debugging information stores
> actual addresses, not <section, offset> pairs.  After reading the
> information, we call objfile_relocate to turn the values read from the
> debugging information into real addresses.  It seems to me that that
> code should be doing this job already.

Perhaps.  I'll look at that.  How does this work for TLS now?

> How does code get loaded in your system?  Does a single module get
> loaded multiple times?

On a system which has shared memory (not UPC 'shared' but memory which
is accessed by all processors/threads) the code image is simply loaded.
Data areas are duplicated for thread-specific data, similar to TLS.
On multi-processors systems which have independent memories, a target
agent loads the processors with the executable.

> In GDB, each objfile represents a specific loading of a library or
> executable.  The information is riddled with real addresses.  If a
> single file is loaded N times, you'll need N objfiles, and the
> debugging information will be duplicated.

Likely not a real problem.  The code image is linear and addresses
don't need to be translated.  Addresses in the code are relative to
either global data or thread-specific data.  They aren't NUMA
addresses.

> In the long run, I think GDB should change to represent debugging
> information in a loading-independent way, so that multiple instances
> of the same library can share the same data.  In a sense, you'd have a
> big structure that just holds data parsed from the file, and then a
> bunch of little structures saying, "I'm an instance of THAT, loaded at
> THIS address."
> 
> This would enable multi-process debugging, and might also allow us to
> avoid re-reading debugging info for shared libraries every time they
> get loaded.

This would address my comment above, that GDB converts from a
symbolic form to an address too early.

>> One problem may be that it may not be clear whether one has a
>> pointer to a linear code space or to a distributed NUMA data space.
>> It might be reasonable to model the linear code space as a 64-bit
>> CORE_ADDR, with the top half zero, while a NUMA address has non-zero
>> values in the top half.  (I don't know if there might be alias
>> problems, where zero might be valid for the top half of a NUMA address.)
> 
> I think this isn't going to be a problem, but it's hard to tell.  Can
> you think of a specific case where we wouldn't be able to tell which
> we have?

Only if the <processor,thread> component of a NUMA address can be
zero, and looks like a linear address.

>> I'd be very happy figuring out where to put a hook which allowed me
>> to translate a NUMA CORE_ADDR into a physical address, setting the
>> thread appropriately.  A bit of a kludge, but probably workable.
> 
> CORE_ADDR should be capable of addressing all memory on the system.  I
> think you'll make a lot of trouble for yourself if you don't follow
> that rule.

The NUMA address has to be translated into a physical address somewhere.
Perhaps lower in the access routines is better.

-- 
Michael Eager	 eager@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077