Mirror of the gdb mailing list
 help / color / mirror / Atom feed
From: Aleksandar Ristovski <aristovski@qnx.com>
To: John Gilmore <gnu@toad.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>,
	Raphael Zulliger	<zulliger@indel.ch>, <gdb@sourceware.org>
Subject: Re: Ensure correct symbol-file when attaching to a (remote) process
Date: Sat, 22 Dec 2012 02:54:00 -0000	[thread overview]
Message-ID: <50D520D8.1080602@qnx.com> (raw)
In-Reply-To: <201212220138.qBM1caIl012256@new.toad.com>

On 12-12-21 08:38 PM, John Gilmore wrote:
>> Interesting timing. I have just posted
>> http://sourceware.org/ml/gdb-patches/2012-12/msg00776.html addressing
>> this issue.
>
> That's a useful patch.  It's smart to read the headers from the
> "symbol_file" bfd (well, in this case the solib bfd used for symbols)
> rather than from the "exec_file" bfd.
>
> But doesn't it read the ELF header twice?  In
> svr4_validate_hdrs_match, it reads it once in the call to
> read_elf_header_from_target, and then reads it again by calling
> read_program_headers_from_target which again calls
> read_elf_header_from_target.  I suspect that you should pass the
> already-read ELF header into each of the read_program_headers
> functions.

Yes, it reads it twice, but our target memory could (and probably 
should) cache this. Note that we are talking about sizeof(Elf32_Ehdr) or 
sizeof(Elf64_Ehdr) which is not a lot.

The way it is proposed in the patch matches corresponding 
read_*_from_bfd functions which do not require passing elf header to 
read phdrs.

This is not to say my proposal is the only way of doing it, what you are 
suggesting is valid approach as well. However, I would prefer to leave 
it as it is in the patch.

>
> Also, in read_elf_header_from_bfd, BFD has probably already read the
> ELF header, when it first opened the file, and has it lying around.
> This code should not be (1) reading it again, nor (2) seeking to file
> offset "0" (not even a #define, not even "0L", but an int constant!)
> to get it.

Elf header is always at 0, so I'm not sure what define would you prefer 
seeing there instead? Elf header is specifically located at offset 0, 
this is by the System V spec. of the elf format. I can put 0L but in 
this case, does it really matter?

>
> Also, it might be simpler in read_program_headers_from_target to just
> translate the ELF header into its internal BFD struct form - then all
> that manual messing with byte orders and field lengths could be
> eliminated.  See how read_program_headers_from_bfd does it.

Note that this is because we read directly from target memory.

>
> You could also just call bfd_get_elf_phdrs (and before that,
> bfd_get_elf_phdr_upper_bound to size the buffer) which would avoid
> spreading details into GDB about these headers and how they are read
> and translated into local byte order.  BFD even has a
> bfd_elf_bfd_from_remote_memory function for creating a BFD by portably
> reading the target's ELF header and phdrs out of memory.

I will have to look at how could bfd be used to read from target memory, 
but the idea is to compare 'raw' data. There is no need to translate 
into internal form - in-memory data needs to match that from the file, 
no cross-endiannes happens here at all (except for, of course, getting 
phdr offset directly from in-memory elf header; I intentionally did not 
want to use bfd's internal representation for that as it corresponds to 
the header read from the file residing on the host, not necessarily 
matching file, which is what we are checking here.

>
> Also, is "validate" the right name for this?  That works for checking
> shared libraries, but it's not as obvious what this function should
> do when e.g. checking the argument to symbol_file, or immediately
> after doing "target attach remote".  If "validation" fails, should
> we refuse to read that symbol file, or refuse to attach to the remote
> target?  What exactly are we validating?  Perhaps "compare_headers"
> or "do_symbols_and_target_match" are better names.

I am not a native English speaker and as such I will accept better 
proposals, but I thought 'validate' would be the right name. We are at 
the point where we are reading symbol file. Symbol file is generic here 
(and 'symbol' somewhat may confuse. The file we found on the host could 
be completely stripped and so only "minimal symbols" (those that can be 
derived from .dynsym/.dynstr) may be available, but that is irrelevant 
to what is being validated here), we are really opening a file on the 
host that corresponds to loaded object in target memory. Therefore, 
attach is only one way of getting into this situation. It may be 
provoked by e.g. noshared ... shared. Regardless of how did we get here, 
we found a file that gdb thinks corresponds to that in the target's 
memory. We are now validating and answering the question: "Is the found 
file valid?". I chose 'validate' as a generic concept, someone may 
decide to implement some other type of validation, e.g. bit-for-bit of 
text segment, someone might decide to open file using target fileio and 
read it all. My proposal is to have check that is practical: it is 
almost always sufficient and a lot faster than doing full segments (i.e. 
exhaustive) comparison. Full segment comparison approach may not include 
anything that could have been modified by the dynamic linker, so we 
would be restricted to full text compare (so still not exhaustive, but 
slightly better than what is in the proposal).

In effect, bfd corresponding to the object that does not pass validation 
gets thrown away and we do not read symbols from it. We had done some 
work on it such as relocating sections, and general bfd work but that's 
about it. This is all host work so I don't think it affects performance 
much. If this is a concern, validation could take place earlier, but I 
don't see a need at the moment.


>
> With those things addressed, it's a good start.  Then, shouldn't this
> call be implemented for non-shared-library symbol files (like
> Mr. Zulliger was hoping for), and for object file formats other than
> ELF?

Not for non-elf formats, but the API is there. It can be implemented (I 
did not plan to do that).

I'm not sure I understand the first part of the question: it doesn't 
matter whether the object is shared or not. What matters is that it is 
an object loaded by the dynamic linker (therefore this validation is 
performed only for dynamic processes that link typically shared objects).

I believe we already have executable comparison in place, but I will 
double check.

>
> Has this code been tested when the target uses a different byte
> order than the host?  Probably not, since it is only currently invoked
> for shared library loads, which usually only happen native.  Is
> there a test case written for it?

This was implemented primarily for remote scenarios as library mismatch 
is a lot more likely in this use-case. I believe I addressed different 
endian-es where applicable. I also put comment on why is bit comparison 
of 'raw' headers fine. We are reading library that resides on the host, 
but is built for the target. Therefore, endianes in the headers must 
match and there is no need to translate it.

Testcase: I haven't written one yet.

>
> Could it be easily extended to check a build-id when one exists?  Or
> would that require different function arguments, or need to happen at
> a different time?

This is stronger than build-id check. Note also that this does not 
presume that any section header (nor section headers in general) be 
present in the loaded image. build-id is not prescribed to be mapped to 
a loadable segment and can not be counted on to be present in targets 
memory. I specifically did not want to require any file operations on 
the target. Notice that data read from the target relies _only_ on data 
that _must_ be present for dynamic linker, and consequently must reside 
in target memory. It does not attempt to use section headers (e.g. we 
strip them for some of our libraries) nor bits that are optional.

>
> Should "target attach" do this kind of validation traffic to and from
> the target?  Or should checking this be a separate step initiated by
> the user?  The target might be in a relatively sensitive state -- or
> the user might not want to wait for this check -- immediately after
> attaching.  We could add yet another GDB setting for automatic or
> manual validation, defaulting to "check", and requiring those who want
> to bypass the check to set that setting.  Is that too much
> complication for the gain?

Traffic should be fairly small. Elf64_Ehdr is 64 bytes. Phdrs may vary 
as their number varies, but from my observations today (on native 
gnu/linux 64 bit) gnulibc has phdrs that are less than 1K in size. This 
is one time work, so I think it won't be detrimental to performance, 
while providing very useful functionality (those who have to deal with 
library mismatches know what I am talking about).

As stated above, this check happens on shared library load. My example 
from the patch is just an illustration of a real scenario (but 
admittedly less common) that can happen when debugging natively.

I do not think there is a need for another option; for example, we 
perform "separate symbol file" CRC calculation without an option to 
disable it (and this *could* be time consuming).


>
> 	John
>


Thanks for looking at the patch and your feedback,

Aleksandar


  reply	other threads:[~2012-12-22  2:54 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-21  6:05 Raphael Zulliger
2012-12-21 16:11 ` Jan Kratochvil
2012-12-21 18:17   ` John Gilmore
2013-01-02 17:57     ` Pedro Alves
2013-01-12 10:59       ` Martin Runge
     [not found]       ` <50EA78FB.3040609@indel.ch>
2013-01-14 18:57         ` Pedro Alves
2013-01-29  3:18       ` Raphael Zulliger
2013-01-29  3:51         ` Raphael Zulliger
2013-02-06 18:47           ` Tom Tromey
2012-12-21 21:12   ` Aleksandar Ristovski
2012-12-22  0:38     ` John Gilmore
2012-12-22  2:54       ` Aleksandar Ristovski [this message]
2012-12-24 20:02         ` Aleksandar Ristovski
2012-12-26  0:47           ` John Gilmore
2012-12-27 20:13             ` Aleksandar Ristovski
2013-01-29  3:18     ` Raphael Zulliger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50D520D8.1080602@qnx.com \
    --to=aristovski@qnx.com \
    --cc=gdb@sourceware.org \
    --cc=gnu@toad.com \
    --cc=jan.kratochvil@redhat.com \
    --cc=zulliger@indel.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox