From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id 6L8sK8fuYWcITB8AWB0awg (envelope-from ) for ; Tue, 17 Dec 2024 16:36:07 -0500 Authentication-Results: simark.ca; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=NYh4e2nH; dkim-atps=neutral Received: by simark.ca (Postfix, from userid 112) id AC2411E097; Tue, 17 Dec 2024 16:36:07 -0500 (EST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-5.4 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=unavailable autolearn_force=no version=4.0.0 Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id CC8FB1E091 for ; Tue, 17 Dec 2024 16:36:05 -0500 (EST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5D2CD3858D28 for ; Tue, 17 Dec 2024 21:36:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5D2CD3858D28 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=NYh4e2nH Received: from mail-ed1-x530.google.com (mail-ed1-x530.google.com [IPv6:2a00:1450:4864:20::530]) by sourceware.org (Postfix) with ESMTPS id BDEEF3858D1E for ; Tue, 17 Dec 2024 21:35:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BDEEF3858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BDEEF3858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::530 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1734471322; cv=none; b=pCEWHV9d20ORKijY+kIhselG0Fh3Z0C3d9UpUObT6hzxNJuG84nO7Bs3mbelrTaoP/5L1Vw3wtmEsQqib/jcYGzIbzJnktY5CrWECDIcMYTrN71LuMDbCXK5taUQBRnCM8iZqvOwG7N3Vb4qM43oMheyP/GMHZv8+3pc4O0VbiU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1734471322; c=relaxed/simple; bh=Jo5DYFsvtRnSV3SsfSnvVzJrzsp631ANZtu5Hb8LAi0=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=ZmpqZNc4j3vFqa52R6BgK9Mc8pJ0GpVBuaxRoAPTsimMZQTBYNqaM+zkTyNTyUTCy3VacWWZtK2HCfuJVyb+2XPGx7xsVkBIpQP+5wTys+7D1j2E7RJQ/oj4XgNcp1udjSzu6VPU5s7AH+9Pzvag3T9M4gQq2B9USC2QVJ3DArk= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BDEEF3858D1E Received: by mail-ed1-x530.google.com with SMTP id 4fb4d7f45d1cf-5d647d5df90so6864395a12.2 for ; Tue, 17 Dec 2024 13:35:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734471319; x=1735076119; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=E0bBfviMvH2YBo4+LYIH+oXPI8FY8m1ilcY4kDLfxLk=; b=NYh4e2nHDln36z2UVrbh3EhdiNErSfJL3Ib6s1k4GTypVAesS3ML50Do6rWu0whdSw FdibGthUZaO3JzskUUWlxvoKr68NL65RDPGFVYNJ6wVSKBkcrvOkoUbMOgM7LuQFxCm5 E8xSyD9hox+TbtcP+8wXPjjMnRIIYhqGZIeUHzRujlN0LhIFQFI7QufZoqst+ql6SPMe m0HS5GgNAFEFD8C4otA5vJbhtCTag9XH6D/WY52aD5BGF9Om14cCAE+PvtLDxIXUQs9O Mwt4Ar59muCQqt15PCRZxYOSn1jhnPhu+JLHs9y4TJ2YizK4s0JOebkvogc0yE/45IML 8Q/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734471319; x=1735076119; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=E0bBfviMvH2YBo4+LYIH+oXPI8FY8m1ilcY4kDLfxLk=; b=HhkjFMyczYR19a6faceKF+OZtFguECbAzUrcLbY0h+wJ7D1NoC5iMlWwEnS4X3BvKj lPp5PbdLC2b5vsUMp3TM76Nt7zqlb+sorHCqWmfPwfkR1Cs20a2vSoAcrjtYiX8d1OYP XmwRkW777pZKPS1A9tJ4nRp+GA4IvJcgKRxDK2ji+mkIb3+fe1mvWUpg10raqCtwHs5w 9bpG45BfQPowu+UTZyPpN1pYrGXGYAgIDrgCoM4s67tOhTyIRuvqSy/23t0iaLsE0zyP igJurnhyTawjpySDCXFeILJF4AS+GV453bxR89mnkBg04wK58k7HRQ4TNVpDWtOnT5R7 BXmA== X-Gm-Message-State: AOJu0Ywe16de32UUXBPjjy/hLY+KPr9Xl8ySnqY0tqG1BKd5fdwJWqTC +Dkh4tSzkjbUhavc20iV19sK9EbrbvaABbvOHijo/ttgrsQTbovjXgbujvGM930259Vn6gURHYU hlobk8RaVHPyOLEji48Qt7rqOf3Y= X-Gm-Gg: ASbGncsmrySAGz2Ygiavg9KgLTW+nj8iJ7e0uhIKzHkACg9TqSHPn06LevCELwuSp+7 16rSu+7vuXMxqvZlVQ0SYWes8N4QPOIuStiMke8g= X-Google-Smtp-Source: AGHT+IFYEJcFcrUi0aiiV6zSBWYnrkdULa9RSmd1eoH8yKndXyYXORnMlhRZwOVlagnIzvVFCyHAEdYJtGXfIPcez1Y= X-Received: by 2002:a05:6402:3603:b0:5d0:e00d:93e8 with SMTP id 4fb4d7f45d1cf-5d7ee3b3ff9mr533612a12.6.1734471318276; Tue, 17 Dec 2024 13:35:18 -0800 (PST) MIME-Version: 1.0 References: <20241205135550.9320-1-wqferr@gmail.com> In-Reply-To: From: William Ferreira Date: Tue, 17 Dec 2024 18:36:26 -0300 Message-ID: Subject: Re: [PATCH] [gdb] Create script to convert old tests into Dwarf::assemble calls. To: Guinevere Larsen Cc: gdb-patches@sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces~public-inbox=simark.ca@sourceware.org I've addressed all of your concerns and I will send the next version of the patch soon. One comment of yours in particular made me rethink a comment section: > According to the comment above, this should not send actual_value, but > instead the string "referenced". Is this an oversight here or (more > likely) in the comment? The original meaning I thought the comment was passing along was that it was keeping track of which DIEs were referenced, not that a string literal "referenced" had a special meaning. I've since added more comments in that region about how this is done - an instance variable - and annotated said instance variable. Annotations are comments that are visible in some IDEs' hover feature. On Tue, Dec 17, 2024 at 1:43=E2=80=AFPM Guinevere Larsen wrote: > > Thanks for working on this amazing script! This is really fancy, and I > really like what you got going > > I have some specific feedback on some lines, but in general I really > like where the script is going! > On 12/5/24 10:55 AM, William Ferreira wrote: > > PR testsuite/32261 requests a script that could convert old .S-based > > tests (that were made before dwarf.exp existed) into the new > > Dwarf::assemble calls in Tcl. This commit is an initial implementation > > of such a script. Python was chosen for convenience, and only relies on > > a single external library. > > > > Usage: the script operates not on .S files, but on ELF files with DWARF > > information. To convert an old test, one must run said test via > > `make check-gdb TESTS=3Dtestname` in their build directory. This will > > produce, as a side effect, an ELF file the test used as an inferior, at > > gdb/testsuite/outputs/testname/testname. This ELF file is this script's > > input. > > > > Reliability: not counting the limitations listed below, the script seem= s > > functional enough to be worthy of discussion in the mailing list. I hav= e > > been testing it with different tests that already use Dwarf::assemble, > > to see if the script can produce a similar call to it. Indeed, in the > > cases that I've tested (including some more complex ones, marked with a= n > > asterisk below) it is able to produce comparable output to the original > > exp file. Of course, it can't reproduce the complex code *before* the > > Dwarf::assemble call. Values calculated then are simply inlined. > > > > The following .exp files have been tried in this way and their outputs > > highly resemble the original: > > - gdb.dwarf2/dynarr-ptr > > - gdb.dwarf2/void-type > > - gdb.dwarf2/ada-thick-pointer > > - gdb.dwarf2/atomic-type > > - gdb.dwarf2/dw2-entry-points (*) > > > > The following .exp files DO NOT WORK with this script: > > - gdb.dwarf2/cu-no-addrs > > - aranges not supported. > > - Compile unit hi_pc and low_pc hardcoded, prone to user error > > due to forgetting to replace with variables. > > > > The following .exp files work, with caveats addressed in the limitation= s > > section below. > > - gdb.dwarf2/cpp-linkage-name > > - Works correctly except for one attribute of the form SPECIAL_expr. > > - gdb.dwarf2/dw2-unusual-field-names > > - Same as above, with two instances of SPECIAL_expr. > > - gdb.dwarf2/implptr-optimized-out > > - Same as above, with two instances of SPECIAL_expr. > > > > Currently the script has the following known limitations: > > - It does not support Tcl outside a small boilerplate and > > Dwarf::assemble calls. > I don't think we need to support anything other than creating the > dwarf::assemble stuff. Everything else is usual boilerplate from the > testsuite, so we're pretty familiar with it (or at least have plenty of > things to double-check) while dwarf assembler isn't as common and much > more complex. > > - It does not support line tables. > > - It does not use $srcfile and other variables in the call to > > Dwarf::assemble (since it can't know where it is safe to substitute)= . > > - It does not support "fission" type DWARFs (in fact I still have no > > clue what those are). > > - It does not support cu {label LABEL} {} CUs, mostly because I couldn'= t > > find the information using pyelftools. > > - It sometimes outputs empty CUs at the start and end of the call. This > > might be a problem with my machine, but I've checked with DWARF dump= s > > and they are indeed in the input ELF files generated by > > `make check-gdb`. > > - It does not support attributes with the forms DW_FORM_block* and > > DW_FORM_exprloc. This is mostly not a concern of the difficulty of > > the implementation, but of it being an incomplete feature and, thus, > > more susceptible to users forgetting to correct its mistakes or > > unfinished values (please see discussion started by Guinevere at > > comment 23 https://sourceware.org/bugzilla/show_bug.cgi?id=3D32261#c= 23). > > The incompleteness of this feature is easy to demonstrate: any call = to > > gdb_target_symbol, a common tool used in these attributes, needs a > > symbol name that is erased after compilation. There is no way to gue= ss > > where that address being referenced in a DW_OP_addr comes from, and = it > > can't be hard coded since it can change depending on the machine > > compiling it. > > > > Please bring up any further shortcomings this script may have with your > > expectations. > > > > Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=3D32261 > > --- > > gdb/testsuite/lib/asm_to_dwarf_assembler.py | 830 +++++++++++++++++++= + > > 1 file changed, 830 insertions(+) > > create mode 100644 gdb/testsuite/lib/asm_to_dwarf_assembler.py > > > > diff --git a/gdb/testsuite/lib/asm_to_dwarf_assembler.py b/gdb/testsuit= e/lib/asm_to_dwarf_assembler.py > > new file mode 100644 > > index 00000000000..ea86c19c805 > > --- /dev/null > > +++ b/gdb/testsuite/lib/asm_to_dwarf_assembler.py > > @@ -0,0 +1,830 @@ > > +# Copyright 2024 Free Software Foundation, Inc. > > + > > +# This program is free software; you can redistribute it and/or modify > > +# it under the terms of the GNU General Public License as published by > > +# the Free Software Foundation; either version 3 of the License, or > > +# (at your option) any later version. > > +# > > +# This program is distributed in the hope that it will be useful, > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > +# GNU General Public License for more details. > > +# > > +# You should have received a copy of the GNU General Public License > > +# along with this program. If not, see = . > > + > > +# Due to the pyelftools dependency, this script requires Python versio= n > > +# 3.10 or greater to run. > > + > > +"""A utility to convert ELF files with DWARF info to Dwarf::assemble c= ode. > > + > > +Usage: > > + python ./asm_to_dwarf_assembler.py > > + > > +Dependencies: > > + Python >=3D 3.10 > > + pyelftools >=3D 0.31 > > + > > +Notes: > > +- Line tables are not currently supported. > > +- Non-contiguous subprograms are not currently supported. > > +- If you want to use $srcfile or similar, you must edit the references= to the > > + file name manually, including DW_AT_name attributes on compile units= . > > +- If run with binaries generated by make check-gdb, it may include an > > + additional compile_unit before and after the actual compile units. T= his is > > + an artifact of the normal compilation process, as these CUs are inde= ed in > > + the generated DWARF in some cases. > > +""" > > + > > +import sys > > +import argparse > > +import elftools > > +import re > > + > > +from io import BytesIO, IOBase > > +from typing import Any, Optional, Iterable, List, Union, Annotated > > +from functools import cache > > +from dataclasses import dataclass > > +from copy import copy > > +from datetime import datetime > > +from logging import getLogger > > + > > +from elftools import dwarf, elf > > +from elftools.elf.elffile import ELFFile > > +from elftools.dwarf.compileunit import CompileUnit as RawCompileUnit > > +from elftools.dwarf.die import DIE as RawDIE, AttributeValue > > +from elftools.common.exceptions import DWARFError > > + > > + > > +logger =3D getLogger(__file__) > > + > > + > > +EXPR_ATTRIBUTE_FORMS =3D [ > > + "DW_FORM_exprloc", > > + "DW_FORM_block", > > + "DW_FORM_block1", > > + "DW_FORM_block2", > > + "DW_FORM_block4", > > +] > Since these aren't supported, does it make sense to keep them here? > > + > > + > > +@dataclass > > +class DWARFOperation: > > + """Values of the form DW_OP_.""" > > + suffix: Annotated[str, 'Part of the name that succeeds "DW_OP_".'] > > + code: Annotated[int, "Numeric representation of the OP."] > > + num_operands: Annotated[int, "Number of operands this operator exp= ects."] > > + > > + > > +dwarf_operations: Annotated[ > > + dict[int, DWARFOperation], > > + "Table of all DWARF operations as per table 7.9 of the DWARF5 spec= ." > > +] =3D {} > > +def _register_op(suffix: str, code: int, num_operands: int) -> None: > > + assert code not in dwarf_operations, "Duplicate operation code fou= nd." > > + dwarf_operations[code] =3D DWARFOperation(suffix, code, num_operan= ds) > > + > > +def _register_all_ops() -> None: > > + # Op codes 0x01 and 0x02 are reserved. > > + > > + _register_op("addr", 0x03, 1) > > + > > + # Op codes 0x04 and 0x05 are reserved. > > + > > + _register_op("deref", 0x06, 0) > > + > > + # Op code 0x07 is reserved. > > + > > + _register_op("const1u", 0x08, 1) > > + _register_op("const1s", 0x09, 1) > > + _register_op("const2u", 0x0a, 1) > > + _register_op("const2s", 0x0b, 1) > > + _register_op("const4u", 0x0c, 1) > > + _register_op("const4s", 0x0d, 1) > > + _register_op("const8u", 0x0e, 1) > > + _register_op("const8s", 0x0f, 1) > > + _register_op("constu", 0x10, 1) > > + _register_op("consts", 0x11, 1) > > + > > + _register_op("dup", 0x12, 0) > > + _register_op("drop", 0x13, 0) > > + _register_op("over", 0x14, 0) > > + _register_op("pick", 0x15, 1) > > + _register_op("swap", 0x16, 0) > > + _register_op("rot", 0x17, 0) > > + _register_op("xderef", 0x18, 0) > > + > > + _register_op("abs", 0x19, 0) > > + _register_op("and", 0x1a, 0) > > + _register_op("div", 0x1b, 0) > > + _register_op("minus", 0x1c, 0) > > + _register_op("mod", 0x1d, 0) > > + _register_op("mul", 0x1e, 0) > > + _register_op("neg", 0x1f, 0) > > + _register_op("not", 0x20, 0) > > + _register_op("or", 0x21, 0) > > + _register_op("plus", 0x22, 0) > > + _register_op("plus_uconst", 0x23, 1) > > + _register_op("shl", 0x24, 0) > > + _register_op("shr", 0x25, 0) > > + _register_op("shra", 0x26, 0) > > + _register_op("xor", 0x27, 0) > > + _register_op("bra", 0x28, 1) > > + _register_op("eq", 0x29, 0) > > + _register_op("ge", 0x2a, 0) > > + _register_op("gt", 0x2b, 0) > > + _register_op("le", 0x2c, 0) > > + _register_op("lt", 0x2d, 0) > > + _register_op("ne", 0x2e, 0) > > + _register_op("skip", 0x2f, 0) > > + > > + for lit_nr in range(32): > > + _register_op(f"lit{lit_nr}", 0x30 + lit_nr, 0) > > + > > + for reg_nr in range(32): > > + _register_op(f"reg{reg_nr}", 0x50 + reg_nr, 0) > > + > > + for breg_nr in range(32): > > + _register_op(f"breg{breg_nr}", 0x70 + breg_nr, 0) > > + > > + _register_op("regx", 0x90, 1) > > + _register_op("fbregx", 0x91, 1) > > + _register_op("bregx", 0x92, 2) > > + _register_op("piece", 0x93, 1) > > + _register_op("deref_size", 0x94, 1) > > + _register_op("xderef_size", 0x95, 1) > > + _register_op("nop", 0x96, 0) > > + _register_op("push_object_address", 0x97, 0) > > + > > + _register_op("call2", 0x98, 1) > > + _register_op("call4", 0x99, 1) > > + _register_op("call_ref", 0x9a, 1) > > + _register_op("form_tls_address", 0x9b, 0) > > + _register_op("call_frame_cfa", 0x9c, 0) > > + > > + _register_op("bit_piece", 0x9d, 2) > > + _register_op("implicit_value", 0x9e, 2) > > + _register_op("stack_value", 0x9f, 0) > > + _register_op("implicit_pointer", 0xa0, 2) > > + _register_op("addrx", 0xa1, 1) > > + _register_op("constx", 0xa2, 1) > > + > > + _register_op("entry_value", 0xa3, 2) > > + _register_op("const_type", 0xa4, 3) > > + _register_op("regval_type", 0xa5, 2) > > + _register_op("deref_type", 0xa6, 2) > > + _register_op("xderef_type", 0xa7, 2) > > + > > + _register_op("convert", 0xa8, 1) > > + _register_op("reinterpret", 0xa9, 1) > > + > > + > > +_register_all_ops() > Similar question to here, if I understood later code correctly. Do we > want to keep this if we're not doing location expressions? > > + > > + > > +copyright_notice_template =3D """ > > +# Copyright 2024-{current_year} Free Software Foundation, Inc. > If the script is adding copyright, it will be for a new file, and so > copyright will always start on the year when the script is run. So it > can just be {current_year} > > +# > > +# This program is free software; you can redistribute it and/or modify > > +# it under the terms of the GNU General Public License as published by > > +# the Free Software Foundation; either version 3 of the License, or > > +# (at your option) any later version. > > +# > > +# This program is distributed in the hope that it will be useful, > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > +# GNU General Public License for more details. > > +# > > +# You should have received a copy of the GNU General Public License > > +# along with this program. If not, see = . > > +""".lstrip() # Strip to remove initial \n, but not the last one. > > + > > +# Workaround for my editor not to freak out over unclosed braces. > > +lbrace, rbrace =3D "{", "}" > > + > > +@cache > > +def get_indent_str(indent_count: int) -> str: > > + """Get whitespace string to prepend to another for indenting.""" > > + indent =3D (indent_count // 2) * "\t" > > + if indent_count % 2 =3D=3D 1: > > + indent +=3D " " > > + return indent > > + > > + > > +def indent(line: str, indent_count: int) -> str: > > + """Indent line by indent_count levels.""" > > + return get_indent_str(indent_count) + line > > + > > + > > +def labelify_str(s: str) -> str: > > + """Make s appropriate for a label name.""" > > + # Replace "*" with the literal word "ptr". > > + s =3D s.replace("*", "ptr") > > + > > + # Replace any non-"word" characters by "_". > > + s =3D re.sub(r"\W", "_", s) > > + > > + # Remove consecutive "_"s. > > + s =3D re.sub(r"__+", "_", s) > > + > > + return s > > + > > + > > +class DWARFAttribute: > > + """Storage unit for a single DWARF attribute. > > + > > + All its values are strings that are usually passed on > > + directly to format. The exceptions to this are attributes > > + with int values with DW_FORM_ref4 or DW_FORM_ref_addr form. > > + Their values are interpreted as the global offset of the DIE > > + being referenced, which are looked up dynamically to fetch > > + their labels. > > + """ > > + def __init__( > > + self, > > + die_offset: int, > > + name: str, > > + value: str | bytes | int | bool, > > + form=3DNone, > > + ): > > + self.die_offset =3D die_offset > > + self.name =3D name > > + self.value =3D value > > + self.form =3D form > > + > > + def _format_expr_value(self) -> str: > > + # This code is left here in case this is feature is desired in= the > > + # future. It is partly functional, being able to decode operat= ions > > + # but not operands due to their variable length in bytes. > > + > > + # self.form =3D "SPECIAL_expr" > > + # lines =3D [lbrace] > > + # curr_idx =3D 0 > > + # while curr_idx < len(self.value): > > + # curr_op_code =3D self.value[curr_idx] > > + # > > + # try: > > + # curr_op: DWARFOperation =3D dwarf_operations[curr_op= _code] > > + # except KeyError as err: > > + # raise KeyError( > > + # f"Unknown op: {curr_op_code}, item {curr_idx} of= " > > + # f" attribute {self.name}, DIE at offset" > > + # f" {self.die_offset}." > > + # ) from err > > + # # Empty arg here so it inserts a space before the first = argument, > > + # # only if there is indeed a first argument. > > + # args =3D [""] > > + # for arg_nr in range(curr_op.num_operands): > > + # curr_idx +=3D 1 > > + # # TODO decode bytes according to specific operation. > > + # # The following line only reads the next BYTE of the= sequence, > > + # # and needs to be adjusted to read the correct numbe= r of > > + # # bytes. > > + # # args.append(str(self.value[curr_idx])) > > + # op_line =3D "DW_OP_" + curr_op.suffix + " ".join(args) > > + # lines.append(op_line) > > + # curr_idx +=3D 1 > > + # > > + # lines.append(rbrace) > > + # return "\n".join(lines) > > + self.form =3D "SPECIAL_expr" > > + return "{ TODO: Fill expr list }" > Maybe, rather than "TODO" you could add "MANUAL" or something, to make > it clear that this is a manual process and expected to be so? (naming is > hard, but I think todo implies that this will be improved soon, and I > don't think that'd be the case?) > > + > > + def _needs_escaping(self, str_value: str) -> bool: > > + charset =3D set(str_value) > > + return bool(charset.intersection({"{", "}", " ", "\t"})) > > + > > + def _format_str(self, str_value: str) -> str: > > + if self._needs_escaping(str_value): > > + escaped_str =3D str(str_value) > > + # Replace single escape (which is itself escaped because o= f regex) > > + # with a double escape (which doesn't mean anything to reg= ex so > > + # it doesn't need escaping). > > + escaped_str =3D re.sub(r"\\", r"\\", escaped_str) > > + escaped_str =3D re.sub("([{}])", r"\\\1", escaped_str) > > + return "{" + escaped_str + "}" > > + else: > > + return str_value > > + > > + def _format_value( > > + self, > > + offset_die_lookup: dict[int, "DWARFDIE"], > > + indent_count: int =3D 0 > > + ) -> str: > > + if self.form in EXPR_ATTRIBUTE_FORMS: > > + return self._format_expr_value() > > + elif isinstance(self.value, bool): > > + return str(int(self.value)) > > + elif isinstance(self.value, int): > > + if self.form =3D=3D "DW_FORM_ref4": > > + # ref4-style referencing label. > > + die: "DWARFDIE" =3D offset_die_lookup[self.value] > > + return ":$" + die.tcl_label > > + elif self.form =3D=3D "DW_FORM_ref_addr": > > + # ref_addr-style referencing label. > > + die: "DWARFDIE" =3D offset_die_lookup[self.value] > > + return "%$" + die.tcl_label > > + else: > > + return str(self.value) > > + elif isinstance(self.value, bytes): > > + return self._format_str(self.value.decode("ascii")) > > + elif isinstance(self.value, str): > > + return self._format_str(self.value) > > + else: > > + raise NotImplementedError( > > + "Unknown data type: " + str(type(self.value)) > > + ) > > + > > + def format( > > + self, > > + offset_die_lookup: dict[int, "DWARFDIE"], > > + indent_count: int =3D 0 > > + ) -> str: > > + """Format the attribute in the form {name value form}. > > + > > + If form is DW_FORM_exprloc or DW_FORM_block, see next section = on > > + DWARFOperations. > > + > > + If it isn't, value is formatted as follows: > > + If bool, use "1" if True, "0" if False. > > + If int: > > + If form is DW_FORM_ref4, use ":$label" where label is = the > > + tcl_label of the DWARFDIE at offset "value". > > + If form is DW_FORM_ref_addr, use "%$label" where label= is > > + the tcl_label of the DWARFDIE at offset "value". > > + Else, use value directly. > > + If bytes, use value.decode("ascii") > > + If str, use value directly. > > + Any other type results in a NotImplementedError being rais= ed. > > + > > + Regarding DW_FORM_exprloc and DW_FORM_block: > > + The form is replaced with SPECIAL_expr. > > + The entries in the value are interpreted and decoded using= the > > + dwarf_operations dictionary, and replaced with their names= where > > + applicable. > > + """ > > + s =3D lbrace > > + s +=3D self.name + " " > > + s +=3D self._format_value(offset_die_lookup) > > + > > + # Only explicitly state form if it's not a reference. > > + if self.form not in [None, "DW_FORM_ref4", "DW_FORM_ref_addr"]= : > > + s +=3D " " + self.form > > + > > + s +=3D rbrace > > + return indent(s, indent_count) > > + > > + > > +class DWARFDIE: > > + """This script's parsed version of a RawDIE.""" > > + def __init__( > > + self, > > + offset: int, > > + tag: str, > > + attrs: dict[str, DWARFAttribute], > > + tcl_label: Optional[str] =3D None > > + ): > > + self.offset =3D offset > > + self.tag =3D tag > > + self.attrs =3D copy(attrs) > > + self.children =3D [] > > + self.tcl_label =3D tcl_label > > I think it would be worth a comment here or in format lines, regarding > what tcl_label is actually about. > > it might not be self-evident what these are, especially for people > unfamiliar with the assembler. > > > + > > + def format_lines( > > + self, > > + offset_die_lookup: dict[int, "DWARFDIE"], > > + indent_count: int =3D 0 > > + ) -> list[str]: > > + """Get the list of lines that represent this DIE in Dwarf asse= mbler.""" > > + die_lines =3D [] > > + > > + # Prepend label to first line, if it's set. > > + if self.tcl_label: > > + first_line_start =3D self.tcl_label + ": " > > + else: > > + first_line_start =3D "" > > + > > + # First line, including label. > > + first_line =3D indent( > > + first_line_start + self.tag + " " + lbrace, > > + indent_count > > + ) > > + die_lines.append(first_line) > > + > > + # Format attributes, if any. > > + if self.attrs: > > + for attr_name, attr in self.attrs.items(): > > + attr_line =3D attr.format( > > + offset_die_lookup, > > + indent_count=3Dindent_count+1 > > + ) > > + die_lines.append(attr_line) > > + die_lines.append(indent(rbrace, indent_count)) > > + else: > > + # Don't create a new line, just append and immediately clo= se the > > + # brace on the last line. > > + die_lines[-1] +=3D rbrace > > + > > + # Format children, if any. > > + if self.children: > > + # Only open a new brace if there are any children for the > > + # current DIE. > > + die_lines[-1] +=3D " " + lbrace > > + for child in self.children: > > + child_lines =3D child.format_lines( > > + offset_die_lookup, > > + indent_count=3Dindent_count+1 > > + ) > > + die_lines.extend(child_lines) > > + die_lines.append(indent(rbrace, indent_count)) > > + > > + return die_lines > > + > > + def format( > > + self, > > + offset_die_lookup: dict[int, "DWARFDIE"], > > + indent_count: int =3D 0 > > + ) -> str: > > + """Join result from format_lines into a single str.""" > > + return "\n".join(self.format_lines(offset_die_lookup, indent_c= ount)) > > + > > + def name(self) -> Optional[str]: > > + """Get DW_AT_name (if present) decoded as ASCII.""" > > + raw_value =3D self.attrs.get("DW_AT_name") > > + if raw_value is None: > > + return None > > + else: > > + return raw_value.value.decode("ascii") > > + > > + def type_name(self) -> str: > > + """Name of Dwarf tag, with the "DW_TAG_" prefix removed.""" > > + return re.sub("DW_TAG_", "", self.tag) > > + > > +class DWARFCompileUnit(DWARFDIE): > > + """Wrapper subclass for CU DIEs. > > + > > + This is necessary due to the special format CUs take in Dwarf::ass= emble. > > + > > + Instead of simply: > > + DW_TAG_compile_unit { > > + > > + } { > > + > > + } > > + > > + CUs are formatted as: > > + cu { } { > > + DW_TAG_compile_unit { > > + > > + } { > > + > > + } > > + } > > + """ > > + > > + # Default value for parameter dwarf_version defined in dwarf.exp l= ine 1552. > > + default_dwarf_version =3D 4 > > + > > + # Default value for parameter is_fission defined in dwarf.exp line= 1556. > > + # Currently not implemented, see comment below. > > + # default_is_fission =3D False > > + > > + # Tag that signifies a DIE is a compile unit. > > + compile_unit_tag =3D "DW_TAG_compile_unit" > > + > > + def __init__( > > + self, > > + raw_die: RawDIE, > > + raw_cu: RawCompileUnit, > > + attrs: dict[str, DWARFAttribute], > > + ): > > + """Initialize additional instance variables for CU encoding. > > + > > + The additional instance variables are: > > + - is_64_bit: Optional[bool] > > + default None > > + Whether this CU is 64 bit or not. > > + - dwarf_version: int > > + default DWARFCompileUnit.default_dwarf_version > > + Version of DWARF this CU is using. > > + - addr_size: Optional[int] > > + default None > > + Size of an address in bytes. > > + > > + These variables are used to configure the first parameter of t= he cu > > + proc (which contains calls to the compile_unit proc in the bod= y of > > + Dwarf::assemble). > > + """ > > + super().__init__( > > + raw_die.offset, > > + DWARFCompileUnit.compile_unit_tag, > > + attrs > > + ) > > + self.raw_cu =3D raw_cu > > + self.is_64_bit: Optional[bool] =3D None > > + self.dwarf_version: int =3D raw_cu.header.get( > > + "version", > > + DWARFCompileUnit.default_dwarf_version > > + ) > > + self.addr_size: Optional[int] =3D raw_cu.header["address_size"= ] > > + > > + # Fission is not currently implemented because I don't know wh= ere to > > + # fetch this information from. > > + # self.is_fission: bool =3D self.default_is_fission > > + > > + # CU labels are not currently implemented because I haven't fo= und where > > + # pyelftools exposes this information. > > + # self.cu_label: Optional[str] =3D None > > + > > + def format_lines( > > + self, > > + offset_die_lookup: dict[int, DWARFDIE], > > + indent_count: int =3D 0, > > + ) -> list[str]: > > + lines =3D [] > > + lines.append(self._get_header(indent_count)) > > + inner_lines =3D super().format_lines( > > + offset_die_lookup, > > + indent_count + 1 > > + ) > > + lines +=3D inner_lines > > + lines.append(indent(rbrace, indent_count)) > > + return lines > > + > > + def _get_header(self, indent_count: int =3D 0) -> str: > > + """Assemble the first line of the surrounding 'cu {} {}' proc = call.""" > > + header =3D indent("cu " + lbrace, indent_count) > > + cu_params =3D [] > > + > > + if self.is_64_bit is not None: > > + # Convert from True/False to 1/0. > > + param_value =3D int(self.is_64_bit) > > + cu_params +=3D ["is_64", str(param_value)] > > + > > + if self.dwarf_version !=3D DWARFCompileUnit.default_dwarf_vers= ion: > > + cu_params +=3D ["version", str(self.dwarf_version)] > > + > > + if self.addr_size is not None: > > + cu_params +=3D ["addr_size", str(self.addr_size)] > > + > > + # Fission is not currently implemented, see comment above. > > + # if self.is_fission !=3D DWARFCompileUnit.default_is_fission: > > + # # Same as is_64_bit conversion, True/False -> 1/0. > > + # param_value =3D int(self.is_fission) > > + # cu_params +=3D ["fission", str(param_value)] > > + > > + # CU labels are not currently implemented, see commend above. > > + # if self.cu_label is not None: > > + # cu_params +=3D ["label", self.cu_label] > > + > > + if cu_params: > > + header +=3D " ".join(cu_params) > > + > > + header +=3D rbrace + " " + lbrace > > + return header > > + > > +class DWARFParser: > > + """Converter from pyelftools's DWARF representation to this script= 's.""" > > + > > + def __init__(self, elf_file: IOBase): > > + """Init parser with file opened in binary mode. > > + > > + File can be closed after this function is called. > > + """ > > + self.raw_data =3D BytesIO(elf_file.read()) > > + self.elf_data =3D ELFFile(self.raw_data) > > + self.dwarf_info =3D self.elf_data.get_dwarf_info() > > + self.offset_to_die: dict[int, DWARFDIE] =3D {} > > + self.label_to_die: dict[str, DWARFDIE] =3D {} > > + self.referenced_offsets: set[int] =3D set() > > + self.raw_cu_list: list[RawCompileUnit] =3D [] > > + self.top_level_dies: list[DWARFDIE] =3D [] > > + self.subprograms: list[DWARFDIE] =3D [] > > + self.taken_labels: set[str] =3D set() > > + > > + self._read_cus() > > + self._create_necessary_labels() > > + > > + def _read_cus(self): > I think I'd personally rename this to _read_all_cus or something like > that. It is easy to miss the S at the end and typo the function name or > get confused. > > + """Populate self.raw_cu_list with all CUs in self.dwarf_info."= "" > > + for cu in self.dwarf_info.iter_CUs(): > > + self._read_cu(cu) > > + > > + def _read_cu(self, raw_cu: RawCompileUnit): > > + """Read a compile_unit into self.cu_list.""" > > + self.raw_cu_list.append(raw_cu) > > + for raw_die in raw_cu.iter_DIEs(): > > + if not raw_die.is_null(): > > + self._parse_die(raw_die) > > + > > + def _parse_die(self, raw_die: RawDIE) -> DWARFDIE: > > + """Process a single DIE and add it to offset_to_die. > > + > > + Look for DW_FORM_ref4 and DWD_FORM_ref_addr form attributes an= d replace > > + them with the global offset of the referenced DIE, and marking= the > > + referenced DIE as "referenced". This will be used later to ass= ign and > > + use labels. > > + > > + In case the DIE is a top-level DIE, add it to self.top_level_d= ies. > > + > > + In case the DIE is a subprogram, add it to self.subprograms an= d call > > + self._use_vars_for_low_and_high_pc_attr with it. > > + """ > > + processed_attrs =3D {} > > + attr_value: AttributeValue > > + for attr_name, attr_value in raw_die.attributes.items(): > > + actual_value =3D attr_value.value > > + if attr_value.form in ("DW_FORM_ref4", "DW_FORM_ref_addr")= : > > + referenced_die =3D raw_die.get_DIE_from_attribute(attr= _name) > > + actual_value =3D referenced_die.offset > > + self.referenced_offsets.add(referenced_die.offset) > > + > > + processed_attrs[attr_name] =3D DWARFAttribute( > > + raw_die.offset, > > + attr_name, > > + actual_value, > According to the comment above, this should not send actual_value, but > instead the string "referenced". Is this an oversight here or (more > likely) in the comment? > > + attr_value.form > > + ) > > + > > + if raw_die.tag =3D=3D DWARFCompileUnit.compile_unit_tag: > > + # FIXME: This isn't pretty, as it relies on global parser = state. > > + # It also only works under the assumption CUs can't be nes= ted. > > + die_cu =3D self.raw_cu_list[-1] > > We try and not get code into GDB that has a FIXME, so this has to be > resolved before going in. > > Would it make sense to call _parse_die with the CU that the DIE comes fro= m? > > > + processed_die =3D DWARFCompileUnit(raw_die, die_cu, proces= sed_attrs) > > + else: > > + processed_die =3D DWARFDIE( > > + raw_die.offset, > > + raw_die.tag, > > + processed_attrs, > > + None > > + ) > > + > > + if raw_die.get_parent() is None: > > + # Top level DIE > > + self.top_level_dies.append(processed_die) > > + else: > > + # Setting the parent here assumes the parent was already p= rocessed > > + # prior to this DIE being found. > > + # As far as I'm aware, this is always true in DWARF. > > + processed_parent =3D self.offset_to_die[raw_die.get_parent= ().offset] > > + processed_parent.children.append(processed_die) > > + > > + if processed_die.tag =3D=3D "DW_TAG_subprogram": > > + self.subprograms.append(processed_die) > > + self._use_vars_for_low_and_high_pc_attr(processed_die) > > + > > + self.offset_to_die[processed_die.offset] =3D processed_die > > + return processed_die > > + > > + def _create_necessary_labels(self): > > + """Create labels to DIEs that were referenced by others.""" > > + for offset in self.referenced_offsets: > > + die =3D self.offset_to_die[offset] > > + self._create_label_for_die(die) > > + > > + def _use_vars_for_low_and_high_pc_attr(self, subprogram: DWARFDIE)= -> None: > > + """Replace existing PC attributes with Tcl variables. > > + > > + If DW_AT_low_pc exists for this DIE, replace it with accessing= the > > + variable whose name is given by self.subprogram_start_var(subp= rogram). > > + > > + If DW_AT_high_pc exists for this DIE, replace it with accessin= g the > > + variable whose name is given by self.subprogram_end_var(subpro= gram). > > + """ > > + low_pc_attr_name =3D "DW_AT_low_pc" > > + if low_pc_attr_name in subprogram.attrs: > > + start =3D self.subprogram_start_var(subprogram) > > + subprogram.attrs[low_pc_attr_name].value =3D start > > + > > + high_pc_attr_name =3D "DW_AT_high_pc" > > + if high_pc_attr_name in subprogram.attrs: > > + end =3D self.subprogram_end_var(subprogram) > > + subprogram.attrs[high_pc_attr_name].value =3D end > > + > > + def _create_label_for_die(self, die: DWARFDIE) -> None: > > + """Set tcl_label to a unique string among other DIEs for this = parser. > > + > > + As a first attempt, use labelify(die.name()). If the DIE does = not have > > + a name, use labelify(die.type_name()). > > + > > + If the chosen initial label is already taken, try again append= ing "_2". > > + While the attempt is still taken, try again replacing it with = "_3", then > > + "_4", and so on. > > + > > + This function also creates an entry on self.label_to_die. > > + """ > > + if die.tcl_label is not None: > > + return > > + > > + label =3D labelify_str(die.name() or die.type_name()) > > + > > + # Deduplicate label in case of collision > > + if label in self.taken_labels: > > + suffix_nr =3D 2 > > + > > + # Walrus operator to prevent writing the assembled label_s= uffix > > + # string literal twice. This could be rewritten by copying= the > > + # string literal to the line after the end of the while lo= op, > > + # but I deemed it would be too frail in case one of them n= eeds > > + # to be changed and the other is forgotten. > > + while (new_label :=3D f"{label}_{suffix_nr}") in self.take= n_labels: > > + suffix_nr +=3D 1 > > + label =3D new_label > > + > > + die.tcl_label =3D label > > + self.label_to_die[label] =3D die > > + self.taken_labels.add(label) > > + > > + def subprogram_start_var(self, subprogram: DWARFDIE) -> str: > > + """Name of the Tcl variable that holds the low PC for a subpro= gram.""" > > + return f"${subprogram.name()}_start" > > + > > + def subprogram_end_var(self, subprogram: DWARFDIE) -> str: > > + """Name of the Tcl variable that holds the high PC for a subpr= ogram.""" > > + return f"${subprogram.name()}_end" > > + > > + def all_labels(self) -> set[str]: > > + """Get a copy of the set of all labels known to the parser so = far.""" > > + return copy(self.taken_labels) > > + > > + > > +class DWARFAssemblerGenerator: > > + """Class that generates Dwarf::assemble code out of a DWARFParser.= """ > > + > > + def __init__(self, dwarf_parser: DWARFParser, output=3Dsys.stdout)= : > > + self.dwarf_parser =3D dwarf_parser > > + self.output =3D output > > + > > + def emit(self, line: str, indent_count: int) -> None: > > + """Print a single line indented indent_count times to self.out= put. > > + > > + If line is empty, it will always print an empty line, even wit= h nonzero > > + indent_count. > > + """ > > + if line: > > + line =3D get_indent_str(indent_count) + line > > + print(line, file=3Dself.output) > > + > > + def generate_die(self, die: DWARFDIE, indent_count: int): > > + """Generate the lines that represent a DIE.""" > > + die_lines =3D die.format(self.dwarf_parser.offset_to_die, inde= nt_count) > > + self.emit(die_lines, 0) > > + > > + def generate(self): > > + indent_count =3D 0 > > + > > + year =3D datetime.now().year > > + copyright_notice =3D copyright_notice_template.format(current_= year=3Dyear) > > + for copyright_line in copyright_notice.split("\n"): > > + self.emit(copyright_line, indent_count) > > + > > + self.emit("load_lib dwarf.exp", indent_count) > > + self.emit("require dwarf2_support", indent_count) > > + self.emit("set asm_file [standard_output_file $srcfile2]", ind= ent_count) > > + > > + self.emit("", 0) > > + self.emit("# TODO: call prepare_for_testing", indent_count) > > + > > + self.emit("", 0) > > + self.emit("Dwarf::assemble $asm_file " + lbrace, indent_count) > Honestly, you can just have "Dwarf::assemble $srcfile2" I think. no need > to set an asm_file > > + > > + # Begin Dwarf::assemble body. > > + indent_count +=3D 1 > > + self.emit("global srcdir subdir srcfile", indent_count) > > + > > + all_labels =3D self.dwarf_parser.all_labels() > > + self.emit("declare_labels " + " ".join(all_labels), indent_cou= nt) > > + > > + self.emit("", 0) > > + for subprogram in self.dwarf_parser.subprograms: > > + self.emit(f"get_func_info {subprogram.name()}", indent_cou= nt) > > + > > + for die in self.dwarf_parser.top_level_dies: > > + self.generate_die(die, indent_count) > > + > > + # TODO: line table, if it's within scope (it probably isn't). > > + > > + # End Dwarf::assemble body. > > + indent_count -=3D 1 > > + self.emit(rbrace, indent_count) > > + > > +def main(argv): > > + try: > > + filename =3D argv[1] > > + except IndexError: > > + print("Usage:") > > + print("python ./asm_to_dwarf_assembler.py ") > > + sys.exit(1) > > + > > + try: > > + with open(filename, "rb") as elf_file: > > + parser =3D DWARFParser(elf_file) > > + except Exception as e: > > + print("Error parsing ELF file. Does it contain DWARF informati= on?") > > + print(str(e)) > > + sys.exit(2) > > + generator =3D DWARFAssemblerGenerator(parser) > > + generator.generate() > > + > > +if __name__ =3D=3D "__main__": > > + main(sys.argv) > > > -- > Cheers, > Guinevere Larsen > She/Her/Hers >