From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-69167-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 32147 invoked by alias); 31 Dec 2009 11:51:06 -0000
Received: (qmail 32130 invoked by uid 22791); 31 Dec 2009 11:51:03 -0000
X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 	tests=AWL,BAYES_05,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 31 Dec 2009 11:50:56 +0000
Received: (qmail 10546 invoked from network); 31 Dec 2009 11:50:53 -0000
Received: from unknown (HELO orlando) (pedro@127.0.0.2)   by mail.codesourcery.com with ESMTPA; 31 Dec 2009 11:50:53 -0000
From: Pedro Alves <pedro@codesourcery.com>
To: Vladimir Prus <vladimir@codesourcery.com>
Subject: Re: [MI] core awareness
Date: Thu, 31 Dec 2009 11:51:00 -0000
User-Agent: KMail/1.9.10
Cc: gdb-patches@sourceware.org
References: <200912171533.21699.pedro@codesourcery.com> <200912211748.18682.vladimir@codesourcery.com>
In-Reply-To: <200912211748.18682.vladimir@codesourcery.com>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;   charset="us-ascii"
Content-Transfer-Encoding: 7bit
Message-Id: <200912311150.57763.pedro@codesourcery.com>
X-IsSubscribed: yes
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
X-SW-Source: 2009-12/txt/msg00465.txt.bz2

On Monday 21 December 2009 14:48:18, Vladimir Prus wrote:
> On Thursday 17 December 2009 18:33:21 Pedro Alves wrote:
> 
> > > +@item qXfer:threads:read::@var{offset},@var{length}
> > > +@anchor{qXfer threads read}
> > > +Access the list of threads on target.  @xref{Thread List Format}.  The
> > > +annex part of the generic @samp{qXfer} packet must be empty
> > > +(@pxref{qXfer read}).
> > > +
> > > +This packet is not probed by default; the remote stub must request it,
> > 
> > What's the advantage of this, rather than probing it?  I would
> > see it nice for GDB to know in advance that the target can
> > report core data, but that's not that reporting support
> > for this packet means, since the "core" field is optional.
> 
> Every other qXfer packet works this way. And it does not seem like much burden
> on the stub to announce that this packet is available.

True.  I'm seeing a proliferation of qSupported additions (with
the tracepoint support going in), and I get the feeling that
qSupported will end up being the largest packet a stub has
to receive/send.  Maybe it's an unfounded concern.

> > > diff --git a/gdb/features/osdata.dtd b/gdb/features/osdata.dtd
> > > index a04506b..67de837 100644
> > > --- a/gdb/features/osdata.dtd
> > > +++ b/gdb/features/osdata.dtd
> > > @@ -10,7 +10,10 @@
> > >  <!ATTLIST osdata version CDATA #FIXED "1.0">
> > >  <!ATTLIST osdata type CDATA #REQUIRED>
> > >  
> > > -<!ELEMENT item (column*)>
> > > +<!ELEMENT item (column*, threads?)>
> > >  
> > >  <!ELEMENT column (#PCDATA)>
> > >  <!ATTLIST column name CDATA #REQUIRED>
> > > +
> > > +<!ELEMENT threads (item*)>
> > > +
> > 
> > This dtd change isn't right.  "osdata" isn't specific to
> > processes, it is just a generic table.  A "threads" element
> > doesn't fit in, it wouldn't make sense for anything other
> > than listing processes.  Two alternative solutions:
> > 
> > - Allow multiple sub-table elements in the osdata schema, and
> >   allow a "name" attribute on this element.  In the listing
> >   processes case, you'd have a sub-table whose type/name
> >   attribute would be "threads".  You should change
> >   info_osdata_command to descend into subtables somehow.
> > 
> > - Don't nest "threads" within processes osdata.  Treat it as
> >   a different request / table (e.g., <osdata type="threads">).
> >   That is, on the CLI, "info os processes" works as is, unmodified, and,
> >   you'd add support for "info os threads".  No schema changes
> >   are necessary with this option.  At the MI implementation level, you'd
> >   do two requests to get the same data: get_osdata("processes"), and
> >   get_osdata("threads").
> 
> I have implemented the second approach.

Thanks.

> 
> 
> > > +             if (core != -1)
> > > +               {
> > > +                 char s[11];
> > > +                 sprintf (s, "%u", core);
> > > +
> > > +                 if (count == allocated) 
> > > +                   core_numbers = realloc (core_numbers, 
> > > +                                           sizeof (int ) * (allocated *= 2));
> > 
> > No space after int.  Please move the 'allocated *= 2' in its own
> > statement.  My GM hat tells me that I should point out that there are a
> > bunch of hardcoded buffer sizes in the patch, that the GNU conventions
> > tells us we should avoid.
> 
> I have replaced '11' with  sizeof ("4294967295") + 1.

Thanks.  Though, sizeof ("4294967295") already includes the
size for the \0, so you've replaced 11 with 12.

> It is surely possible, but the original motivation here is mostly embedded
> systems, so I'd prefer to have gdbserver-only version checked in, and then
> anybody interested in native linux can fill the bits for linux.

You're adding linux-nat.c:linux_nat_core_of_thread already, so we're
mostly there ...  I assume it's mostly a matter of copy/paste.  I'll
probably do it myself, for the sake of keeping the linux-nat.c and
linux-low.c osdata code in sync.

> 
> > > +  char filename[sizeof ("/proc//task//stat") 
> > 
> > s,//,/
> 
> I don't think so, actually. The final path will look like
> 
> 	/proc/NNN/task/MMM/stat
> 
> so it will indeed have two slashes between 'proc' and 'task'

Oh, it looked like a typo, sorry.  I see now.

> 
> > > +       case RECURSE_OPT:
> > > +         if (strcmp (optarg, "1") != 0)
> > > +           error ("only '1' is permitted value for the '--recurse' option");
> > 
> > Seems strange not to allow '--recurse 0'.  
> 
> I've made it allowed.
> 
> > What future values are you thinking of?  
> 
> '2', 'inf', for example.
> 
> > Wouldn't it make more sense to call this
> > "--threads"?  We may add other level-1 sublists to thread-groups
> > in the future, and _not_ want --recurse 1 to always show
> > them, no?
> 
> I would say we rather want them. If frontend wishes to grab entire
> thread group hierarchy, he should be able to pass --recurse 1 or
> in future '--recurse inf' to get absolutely everything.

Fine.

(((For the record, in case I wasn't clear, here's what I
was thinking about: listing the threads is an
expensive operation.  At some point we may add
one or more sub-trees directly descendent of processes other
than threads.  If the frontend is interested in one or more of
these cheap-requesting subtrees, but not in expensive-threads,
it has no way, because then `--recurse 1' will always fetch
threads.)))

> 
> > > +      VEC_safe_push (int, ids, inf);
> > >      }
> > > -  else if (available)
> > > +  qsort (VEC_address (int, ids), 
> > > +        VEC_length (int, ids),
> > > +        sizeof (int), compare_positive_ints);
> > > +
> > > +  back_to = make_cleanup ((void (*)(void *))VEC_OP (int, free), &ids);
> > 
> > Please don't cast the function's type: instead write a
> > small function that has the proper cleanup prototype, and
> > call free there.
> 
> Is casting function type going to cause practical problems on some targets?

It's certainly bad practice.  There used to be a make_cleanup_func
typedef <http://sourceware.org/ml/gdb-patches/2000-q1/msg00791.html>,
<PR7196> that was eliminated years ago.  Your cast would go a step
even further back than that dubious old typedef.

> 
> > > +      /* Update the core associated with a thread when we process stop
> > > +        event in that thread.  */
> > > +      info = find_thread_ptid (ptid);
> > > +      if (info && info->private)
> > > +       info->private->core = stop_reply->core;
> > 
> > Consider all-stop, multi-threaded applications:
> > 
> >  What about the core of the other threads?
> >  Shouldn't they all be invalidated?
> >  Won't then be stale?
> > 
> > (see comment below)
> 
> Yes, it will be stale. However, it's approximation only, and the only two
> places where cores number is used is *stopped response (which uses core of
> the stopped thread, updated above) and -list-thread-groups (which updates
> this information anyway). It does not seem right to additionally update
> this even if nothing will use it.

Not update, _invalidate_: tag that the core is unknown and needs fetching
from the target somehow (say, adding a info->private->core_p field?
Or setting it to info->private->core = -2?).   The invalidation could be
done on resume instead of on stop (e.g., linux-nat.c:resume_callback).  I
believe it is bad design practice to design an API and assume
what it's callers do with it.  We can certainly come up with other
target_core_of_thread callers in the near future, and at that point,
we'll hit this issue.  But all that lead me to the real question I have in
my mind: how will we update the core of the thread, when we implement what
I just described?  Will we make remote_core_of_thread update all
threads, or come up with a special packet to get at the core of
a thread?  Probably the former is okay (albeit, slow, although
done only at most once), but if not, I'd rather consider the
packet upfront, than leave a hole in the design --- if we
considered such packet, the case for qXfer:threads would get
smaller (it becomes only justifiable due to supposedly fewer
packet roundtrips to list all threads and their extra info).

> > > +    /* Return the core that thread PTID is on, or -1 if such information
> > > +       is not available. For a stopped thread, this is supposed to return
> > > +       the core the thread was last running on.  For running threads, it
> > > +       should return one of the cores that the thread was running between
> > > +       the call to this function and return -- and if it was running on
> > > +       several cores, any other may be returned.  */
> > > +    int (*to_core_of_thread) (struct target_ops *, ptid_t ptid);
> > 
> > I guess that on some targets, it will be impossible to know which core
> > a running thread was running on.  Should it be simply documented as
> > undefined instead?
> 
> I am not sure I understand. On such target, this method will return -1.

So, covered by the "-1 if such information is not available"?
I was assuming that meant "not available now or ever, period.  No
need to request again, I'll always return -1".

> > Other:
> > 
> >  - did you consider being able to list the available
> >    cores?  A candidate for "info os cores", I guess.
> 
> Unfortunately, I do not know a suitable interface to get that information.
> Parsing /proc/cpuinfo is not really attractive.

We have more code that parses /proc/foo already though,
e.g., linux-nat.c:pid_is_stopped, and I don't suppose it would
be much worse than the new parsing of /proc/pid/task/lwp/stat
to get at the core for a thread, though.

> I attach a revision with yours and Eli's comments addressed, expect as
> mentioned above, and a delta.

Thanks.  Patch comments inline below.


On Monday 21 December 2009 14:48:18, Vladimir Prus wrote:
> @@ -32269,15 +32424,34 @@ An example document is:
>      <column name="pid">1</column>
>      <column name="user">root</column>
>      <column name="command">/sbin/init</column>
> +    <column name="cores">1,2,3</column>
> +    <threads>
> +      <item>
> +      <column name="tid">12</column>
> +      <column name="core">3</column>
> +      </item>
> +    </threads>
>    </item>
>  </osdata>

I think this part of the documentation is now stale?

> +{
> +    int *d = b;
> +    while (++b != e)
> +        if (*d != *b)
> +            *++d = *b;
> +    return ++d;

Bad space/tab/indentation.

> +/* Given PID, iterates over all threads in that process.
> +
> +   Information about each thread, in a format suitable for qXfer:osdata:thread
> +   is printed to BUFFER, if it's not NULL.  The buffer will not be either
> +   initialized, or finished, or have '\0' written to it.  Caller is responsible
> +   for such things.

About:

 "The buffer will not be either initialized, or finished, or have '\0' written to it.
 Caller is responsible for such things."

Do you mean?:

 "The buffer is assumed to be already initialized, and the caller is responsible for
 finishing and appending '\0' to it."

?

The function inits the buffer, finishes the buffer and appends \0 to it, so
I'm confused by that comment.


> +
> +   The list of cores that threads are running on is assigned to *CORES, if it
> +   is not NULL.  If not cores are found, *CORES will be set to NULL.

Typo: "If no cores".  A sentence saying who is responsible for releasing *CORES
would be nice.

> +
> +static void
> +list_threads (int pid, struct buffer *buffer, char **cores)
> +{
> +  int count = 0;
> +  int allocated = 10;
> +  int *core_numbers = xmalloc (sizeof (int) * allocated);
> +  char pathname[128];
> +  DIR *dir;
> +  struct dirent *dp;
> +  struct stat statbuf;
> +
> +  sprintf (pathname, "/proc/%d/task", pid);
> +  if (stat (pathname, &statbuf) == 0 && S_ISDIR (statbuf.st_mode))
> +    {
> +      dir = opendir (pathname);
> +      if (!dir)
> +       return;
> +
> +      while ((dp = readdir (dir)) != NULL)
> +       {
> +         unsigned long lwp = strtoul (dp->d_name, NULL, 10);
> +
> +         if (lwp != 0)
> +           {
> +             unsigned core = linux_core_of_thread (ptid_build (pid, lwp, 0));
> +
> +             if (core != -1)
> +               {
> +                 char s[sizeof ("4294967295") + 1];
> +                 sprintf (s, "%u", core);
> +
> +                 if (count == allocated) {

'{' in new line, reindent block afterwards.

> +                   allocated *= 2;
> +                   core_numbers = realloc (core_numbers,
> +                                           sizeof (int ) * allocated);

Still has spurious space in '(int )'.

> +

Spurious new line.

> +                     }
> +                 core_numbers[count++] = core;
> +                 if (buffer)
> +                   buffer_xml_printf (buffer,
> +                                      "<item>"
> +                                      "<column name=\"pid\">%d</column>"
> +                                      "<column name=\"tid\">%s</column>"
> +                                      "<column name=\"core\">%s</column>"
> +                                      "</item>", pid, dp->d_name, s);
> +               }
> +             else
> +               {
> +                 if (buffer)
> +                   buffer_xml_printf (buffer,
> +                                      "<item>"
> +                                      "<column name=\"pid\">%d</column>"
> +                                      "<column name=\"tid\">%s</column>"
> +                                      "</item>", pid, dp->d_name);
> +               }
> +           }
> +       }
> +    }
> +
> +  if (cores)
> +    {
> +      *cores = NULL;
> +      if (count > 0)
> +       {
> +         struct buffer buffer2;
> +         int *b;
> +         int *e;
> +         qsort (core_numbers, count, sizeof (int), compare_ints);
> +
> +         /* Remove duplicates. */
> +         b = core_numbers;
> +         e = unique (b, core_numbers + count);
> +
> +         buffer_init (&buffer2);
> +
> +         for (b = core_numbers; b != e; ++b)
> +           {
> +             char number[sizeof ("4294967295") + 1];
> +             sprintf (number, "%u", *b);
> +             buffer_xml_printf (&buffer2, "%s%s",
> +                                (b == core_numbers) ? "" : ",", number);
> +           }
> +         buffer_grow_str0 (&buffer2, "");
> +
> +         *cores = buffer_finish (&buffer2);
> +       }
> +    }
> +}

`core_numbers' is leaking.

> @@ -1604,6 +1616,16 @@ buffer_xml_printf (struct buffer *buffer, const char *format, ...)
>                prev = f + 1;
>              }
>              break;
> +          case 'd':
> +            {
> +              int i = va_arg (ap, char *);

Should be:
           int i = va_arg (ap, int);

> +              char b[sizeof ("4294967295") + 1];

Off-by-one-too-much (sizeof includes \0).

> +
> +              buffer_grow (buffer, prev, f - prev - 1);
> +              sprintf (b, "%d", i);
> +              buffer_grow_str (buffer, b);
> +              prev = f + 1;
> +            }


> +/* Return the core for a thread.  */
> +static int

Could you please add an empty new line between function describing
comment and function definition (here and elsewhere)?  We're trying
to follow that style everywhere in gdb.

> +linux_nat_core_of_thread (struct target_ops *ops, ptid_t ptid)
> +{
> +  struct cleanup *back_to;
> +  char *filename;
> +  FILE *f;
> +  char *content = NULL;
> +  char *p;
> +  char *ts = 0;
> +  int content_read = 0;
> +  int i;
> +  int core;
> +
> +  filename = xstrprintf ("/proc/%d/task/%ld/stat",
> +                        GET_PID (ptid), GET_LWP (ptid));
> +  back_to = make_cleanup (xfree, filename);
> +
> +  f = fopen (filename, "r");
> +  if (!f)
> +    return -1;

No biggie, but might as well run the back_to cleanup before
returning.


>  static int
> -print_one_inferior (struct inferior *inferior, void *arg)
> +print_one_inferior (struct inferior *inferior, void *xdata)
>  {
> -  if (inferior->pid != 0)
> +  struct print_one_inferior_data *top_data = xdata;
> +
> +  if (VEC_length (int, top_data->inferiors) == 0

Might as well use VEC_empty here too, like you've
fixed just a bit below.


> +void
> +mi_cmd_list_thread_groups (char *command, char **argv, int argc)
> +{

(...)

> +  qsort (VEC_address (int, ids),
> +        VEC_length (int, ids),
> +        sizeof (int), compare_positive_ints);

VEC_address (int, ids) can be NULL here.  It is undefined
to pass NULL as array to qsort.  We've had problems
on solaris recently due to this.  There's never a need to
call qsort if the vector is empty (or only if length > 1 if
you prefer) anyway.

> diff --git a/gdb/remote.c b/gdb/remote.c
> index 9fa92fb..e393e47 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
(...)
> +/* The core number that was last seed by process_stop_reply.   */
> +static int last_core = -1;

Is "seed" a typo here?

> +/* The thread that corresponds to last_core.  */
> +static ptid_t thread_of_last_core;

Are the last_core and thread_of_last_core globals needed
for when the target reports the "core" in the stop reply,
but doesn't support qxfer:threads:read?  Should we simply
not care for that, and assume that a stub that wants to
report core info uses the new way to fetch thread info?
I'd prefer not to have these globals, by e.g., creating
info->private on the spot when processing the stop reply.
I understand that you did things this way so
that the presence of info->private means that there's no
need to fetch thread extra info with the old packet, so
I'll probably let this one go, but I'm quite certain
we'll end up using info->private for other
(non qxfer:threads) things in the future anyway, so we'll
need a discriminator anyway.

>@@ -2320,6 +2409,68 @@ remote_threads_info (struct target_ops *ops)
...
> +                         if (!info->private)
> +                           info->private = (struct private_thread_info *)
> +                             xmalloc (sizeof (struct private_thread_info));
> +
> +                         info->private->extra = item->extra;
> +                         item->extra = 0;
> +                         info->private->core = item->core;

I think info->private->extra is leaking when threads are deleted,
because gdb/threads.c simply xfree's ->private.  I guess you're the
lucky first to need a destructor for thread private data.
nto-procfs.c was close, but avoids it by using a poor man's flexible
array member (nto-tdep.h). 

-- 
Pedro Alves