From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 44812 invoked by alias); 19 Jun 2015 16:33:56 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 44793 invoked by uid 89); 19 Jun 2015 16:33:56 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.1 required=5.0 tests=AWL,BAYES_50,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 X-Spam-User: qpsmtpd, 2 recipients X-HELO: smtp.eu.adacore.com Received: from mel.act-europe.fr (HELO smtp.eu.adacore.com) (194.98.77.210) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Fri, 19 Jun 2015 16:33:50 +0000 Received: from localhost (localhost [127.0.0.1]) by filtered-smtp.eu.adacore.com (Postfix) with ESMTP id 376B228541D6; Fri, 19 Jun 2015 18:33:47 +0200 (CEST) Received: from smtp.eu.adacore.com ([127.0.0.1]) by localhost (smtp.eu.adacore.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SqvRCT5L0J5t; Fri, 19 Jun 2015 18:33:47 +0200 (CEST) Received: from ulanbator.act-europe.fr (ulanbator.act-europe.fr [10.10.1.67]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.eu.adacore.com (Postfix) with ESMTPSA id 22C1928540F6; Fri, 19 Jun 2015 18:33:46 +0200 (CEST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\)) Subject: Re: RFC: Prevent disassembly beyond symbolic boundaries From: Tristan Gingold In-Reply-To: <5583FFEE.6060106@redhat.com> Date: Fri, 19 Jun 2015 16:33:00 -0000 Cc: binutils@sourceware.org, gdb-patches@sourceware.org Content-Transfer-Encoding: quoted-printable Message-Id: <3F2C1B8E-BFB8-4AE4-BDCE-8B66FC208E4B@adacore.com> References: <87lhfhynoz.fsf@redhat.com> <3D81F97D-90EA-4769-8381-514BB6E81E3F@adacore.com> <5583FFEE.6060106@redhat.com> To: Nicholas Clifton X-IsSubscribed: yes X-SW-Source: 2015-06/txt/msg00407.txt.bz2 > On 19 Jun 2015, at 13:41, Nicholas Clifton wrote: >=20 > Hi Tristan, >=20 >>> This will disassemble as: >>>=20 >>> 0000000000000000 : >>> 0: 24 2f and $0x2f,%al >>> 2: 83 0f ba orl $0xffffffba,(%rdi) >>>=20 >>> 0000000000000003 : >>> 3: 0f ba e2 03 bt $0x3,%edx >>>=20 >>> Note how the instruction decoded at address 0x2 has stolen two bytes >>> from "foo", but these bytes are also decoded (correctly this time) as >>> part of the first instruction of foo. >=20 >> I am curious. Why do you think it was a problem ? >=20 > Strangely enough, this actually causes regressions with the perf tool's t= estsuite: >=20 > https://bugzilla.redhat.com/show_bug.cgi?id=3D1054767 >=20 > What happens is that perf test 21 runs objdump on a binary, *parses* this= output and compares that to the actual bytes in the binary. Because of the= overrun feature shown above you actually get more bytes displayed in objdu= mp's output than actually exist in the binary and so the perf test fails. I can argue that this is an issue in the perf tool. After all, the objdump= output is clear that pc goes backward. >> Even if there is a symbol in the middle of an instruction, I=E2=80=99d l= ike >> to understand what the processor will execute. >=20 > Except that even the current the displayed disassembly is not what the pr= ocessor would execute. In the example above the processor would execute th= e ORL instruction starting at address 0x2. but it would not continue on to = execute the BT instruction at address 0x3. Instead it would start decoding= from address 0x5, whatever instruction that might be=E2=80=A6 That=E2=80=99s a very good point! >> Before the proposed >> change, it was possible, but after it isn=E2=80=99t easy anymore. >=20 > True - but this only matters if the processor would execute from that pie= ce of memory. What if the byte(s) are actually data ? (eg a constant pool= ). Then it would make more sense to display the bytes as just byte values. OTOH, if this is a constant pool it is possible that objdump is already of = out track for a while. > The point being that if there is a symbol that is in the middle of an ins= truction then something hinky is going on. Either the symbol is misplaced = or the instruction is not really an instruction or else an assembly program= mer is being extra super clever and hiding data inside instructions. Yes. My scenario was setting a label on a known part of an instruction lik= e the offset in a call instruction you want to patch later. But I agree that before and after your proposed change, objdump output is n= ot very readable. > How about a tweak to the patch then ? What if the -D option (disassemble= all) disables this feature, and so the disassembled instruction is display= ed as before, whilst the -d option (disassemble code) leaves it enabled. T= hen if you want to see bytes as instructions you can use the -D option (pos= sibly combined with -j), but if you want to see a more likely, only real in= structions disassembled version, then use the -d option. (Obviously the pa= tch would need to be extended with an update to the documentation too). It=E2=80=99s up to you. I don=E2=80=99t insist at all on modifying your ch= ange, I was just curious about the motivation. And the scenario I had in mind is not really affected by your proposal. Tristan.