On Thursday 31 December 2009 14:50:57 Pedro Alves wrote:

> On Monday 21 December 2009 14:48:18, Vladimir Prus wrote:
> > On Thursday 17 December 2009 18:33:21 Pedro Alves wrote:
> > 
> > > > +@item qXfer:threads:read::@var{offset},@var{length}
> > > > +@anchor{qXfer threads read}
> > > > +Access the list of threads on target.  @xref{Thread List Format}.  The
> > > > +annex part of the generic @samp{qXfer} packet must be empty
> > > > +(@pxref{qXfer read}).
> > > > +
> > > > +This packet is not probed by default; the remote stub must request it,
> > > 
> > > What's the advantage of this, rather than probing it?  I would
> > > see it nice for GDB to know in advance that the target can
> > > report core data, but that's not that reporting support
> > > for this packet means, since the "core" field is optional.
> > 
> > Every other qXfer packet works this way. And it does not seem like much burden
> > on the stub to announce that this packet is available.
> 
> True.  I'm seeing a proliferation of qSupported additions (with
> the tracepoint support going in), and I get the feeling that
> qSupported will end up being the largest packet a stub has
> to receive/send.  Maybe it's an unfounded concern.

I think we still have some room before we run out of packet size limits. Anyway,
it is not a problem that is introduced, or made significantly worse by this
patch, so I would rather not solve it at present.


> > > > +             if (core != -1)
> > > > +               {
> > > > +                 char s[11];
> > > > +                 sprintf (s, "%u", core);
> > > > +
> > > > +                 if (count == allocated) 
> > > > +                   core_numbers = realloc (core_numbers, 
> > > > +                                           sizeof (int ) * (allocated *= 2));
> > > 
> > > No space after int.  Please move the 'allocated *= 2' in its own
> > > statement.  My GM hat tells me that I should point out that there are a
> > > bunch of hardcoded buffer sizes in the patch, that the GNU conventions
> > > tells us we should avoid.
> > 
> > I have replaced '11' with  sizeof ("4294967295") + 1.
> 
> Thanks.  Though, sizeof ("4294967295") already includes the
> size for the \0, so you've replaced 11 with 12.

OK, noted and fixed everywhere.

> > It is surely possible, but the original motivation here is mostly embedded
> > systems, so I'd prefer to have gdbserver-only version checked in, and then
> > anybody interested in native linux can fill the bits for linux.
> 
> You're adding linux-nat.c:linux_nat_core_of_thread already, so we're
> mostly there ...  I assume it's mostly a matter of copy/paste.  I'll
> probably do it myself, for the sake of keeping the linux-nat.c and
> linux-low.c osdata code in sync.

OK, so I'll skip this part for now.

> > > > +      /* Update the core associated with a thread when we process stop
> > > > +        event in that thread.  */
> > > > +      info = find_thread_ptid (ptid);
> > > > +      if (info && info->private)
> > > > +       info->private->core = stop_reply->core;
> > > 
> > > Consider all-stop, multi-threaded applications:
> > > 
> > >  What about the core of the other threads?
> > >  Shouldn't they all be invalidated?
> > >  Won't then be stale?
> > > 
> > > (see comment below)
> > 
> > Yes, it will be stale. However, it's approximation only, and the only two
> > places where cores number is used is *stopped response (which uses core of
> > the stopped thread, updated above) and -list-thread-groups (which updates
> > this information anyway). It does not seem right to additionally update
> > this even if nothing will use it.
> 
> Not update, _invalidate_: tag that the core is unknown and needs fetching
> from the target somehow (say, adding a info->private->core_p field?
> Or setting it to info->private->core = -2?).   The invalidation could be
> done on resume instead of on stop (e.g., linux-nat.c:resume_callback).  I
> believe it is bad design practice to design an API and assume
> what it's callers do with it.  

I think it's perfectly valid to optimize the API for the use cases we expect.
Otherwise, it's easy to introduce complexity and pessimization that is not
necessary.

> We can certainly come up with other
> target_core_of_thread callers in the near future, and at that point,
> we'll hit this issue.  But all that lead me to the real question I have in
> my mind: how will we update the core of the thread, when we implement what
> I just described?  Will we make remote_core_of_thread update all
> threads, or come up with a special packet to get at the core of
> a thread?  Probably the former is okay (albeit, slow, although
> done only at most once), but if not, I'd rather consider the
> packet upfront, than leave a hole in the design --- if we
> considered such packet, the case for qXfer:threads would get
> smaller (it becomes only justifiable due to supposedly fewer
> packet roundtrips to list all threads and their extra info).

For a start, I certainly find that a single packet that returns all information
about threads -- which a frontend might be requesting on each step -- is a good
idea regardless of "core awareness". As for invalidation, I think it's reasonable
to fetch information about all threads, just because the time to get that information
is probably less than communication latency for a single packet.

So, for avoidance of doubt, how about this scheme:
- core information of each thread is invalidated when that thread is resumed
  (where resumption is reported by the 'target_resumed' observer)
- remote_core_of_thread checks if it's asked for a core of the most recently
  stopped thread. If so, it reports it. Otherwise, it refreshes the thread list, 
  and then reports core it has fetched
- this all is for all-stop only. For non-stop, we get stop packet of each stopped
  thread and therefore remote_core_of_thread never needs to fetch information
  about all threads.

If this sounds OK, I'll implement such a scheme.
 
> > > > +    /* Return the core that thread PTID is on, or -1 if such information
> > > > +       is not available. For a stopped thread, this is supposed to return
> > > > +       the core the thread was last running on.  For running threads, it
> > > > +       should return one of the cores that the thread was running between
> > > > +       the call to this function and return -- and if it was running on
> > > > +       several cores, any other may be returned.  */
> > > > +    int (*to_core_of_thread) (struct target_ops *, ptid_t ptid);
> > > 
> > > I guess that on some targets, it will be impossible to know which core
> > > a running thread was running on.  Should it be simply documented as
> > > undefined instead?
> > 
> > I am not sure I understand. On such target, this method will return -1.
> 
> So, covered by the "-1 if such information is not available"?
> I was assuming that meant "not available now or ever, period.  No
> need to request again, I'll always return -1".

I see, I have clarified the wording.

> > > Other:
> > > 
> > >  - did you consider being able to list the available
> > >    cores?  A candidate for "info os cores", I guess.
> > 
> > Unfortunately, I do not know a suitable interface to get that information.
> > Parsing /proc/cpuinfo is not really attractive.
> 
> We have more code that parses /proc/foo already though,
> e.g., linux-nat.c:pid_is_stopped, and I don't suppose it would
> be much worse than the new parsing of /proc/pid/task/lwp/stat
> to get at the core for a thread, though.

Well, while I presumably could have split /proc/cpuinfo on empty lines,
and somehow print that in 'info os cores' I am unsure if we can promise
any fields to be present in the output, given that man:proc does not
mention any field except of "processor" and "bogomips". So, all that
"info os cores" can meaningfully print is the numeric ids of cores,
and this does not add very much to the information already available
to the frontend.

> > I attach a revision with yours and Eli's comments addressed, expect as
> > mentioned above, and a delta.
> 
> Thanks.  Patch comments inline below.
> 
> 
> On Monday 21 December 2009 14:48:18, Vladimir Prus wrote:
> > @@ -32269,15 +32424,34 @@ An example document is:
> >      <column name="pid">1</column>
> >      <column name="user">root</column>
> >      <column name="command">/sbin/init</column>
> > +    <column name="cores">1,2,3</column>
> > +    <threads>
> > +      <item>
> > +      <column name="tid">12</column>
> > +      <column name="core">3</column>
> > +      </item>
> > +    </threads>
> >    </item>
> >  </osdata>
> 
> I think this part of the documentation is now stale?

Yes, fixed.
 
> > +/* Given PID, iterates over all threads in that process.
> > +
> > +   Information about each thread, in a format suitable for qXfer:osdata:thread
> > +   is printed to BUFFER, if it's not NULL.  The buffer will not be either
> > +   initialized, or finished, or have '\0' written to it.  Caller is responsible
> > +   for such things.
> 
> About:
> 
>  "The buffer will not be either initialized, or finished, or have '\0' written to it.
>  Caller is responsible for such things."
> 
> Do you mean?:
> 
>  "The buffer is assumed to be already initialized, and the caller is responsible for
>  finishing and appending '\0' to it."
> 
> ?

Yes, this is better wording.
 
> The function inits the buffer, finishes the buffer and appends \0 to it, so
> I'm confused by that comment.

The function does not init or finish BUFFER. It does init/finish a second buffer, but 
that buffer is only used internally. Am I missing something?

> > +
> > +              buffer_grow (buffer, prev, f - prev - 1);
> > +              sprintf (b, "%d", i);
> > +              buffer_grow_str (buffer, b);
> > +              prev = f + 1;
> > +            }
> 
> 
> > +/* Return the core for a thread.  */
> > +static int
> 
> Could you please add an empty new line between function describing
> comment and function definition (here and elsewhere)?  We're trying
> to follow that style everywhere in gdb.

Is that for function definitions only? Or also for function declarations
and variable declarations/definitions?
 
> > diff --git a/gdb/remote.c b/gdb/remote.c
> > index 9fa92fb..e393e47 100644
> > --- a/gdb/remote.c
> > +++ b/gdb/remote.c
> (...)
> > +/* The core number that was last seed by process_stop_reply.   */
> > +static int last_core = -1;
> 
> Is "seed" a typo here?

Yes.

> 
> > +/* The thread that corresponds to last_core.  */
> > +static ptid_t thread_of_last_core;
> 
> Are the last_core and thread_of_last_core globals needed
> for when the target reports the "core" in the stop reply,
> but doesn't support qxfer:threads:read?  Should we simply
> not care for that, and assume that a stub that wants to
> report core info uses the new way to fetch thread info?
> I'd prefer not to have these globals, by e.g., creating
> info->private on the spot when processing the stop reply.

The primary problem is that when we process a stop reply
for a new thread, the thread is not created and added to gdb
thread table yet, so it's not possible to set its info->private.
I don't see any way around this. We can, in theory, make
remote_core_of_thread request the list of all threads even when
called for a thread that has just stopped, but that sounds wasteful.

> >@@ -2320,6 +2409,68 @@ remote_threads_info (struct target_ops *ops)
> ...
> > +                         if (!info->private)
> > +                           info->private = (struct private_thread_info *)
> > +                             xmalloc (sizeof (struct private_thread_info));
> > +
> > +                         info->private->extra = item->extra;
> > +                         item->extra = 0;
> > +                         info->private->core = item->core;
> 
> I think info->private->extra is leaking when threads are deleted,
> because gdb/threads.c simply xfree's ->private.  I guess you're the
> lucky first to need a destructor for thread private data.
> nto-procfs.c was close, but avoids it by using a poor man's flexible
> array member (nto-tdep.h). 

Done.

I attach a revised patch.

- Volodya