From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 120025 invoked by alias); 14 May 2019 13:49:36 -0000 Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org Received: (qmail 119959 invoked by uid 89); 14 May 2019 13:49:35 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.1 spammy=anderson, whatsoever, furthermore, Anderson X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 14 May 2019 13:49:33 +0000 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x4EDiMKH029857 for ; Tue, 14 May 2019 09:49:31 -0400 Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com [195.75.94.101]) by mx0a-001b2d01.pphosted.com with ESMTP id 2sfwu3k630-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 14 May 2019 09:49:31 -0400 Received: from localhost by e06smtp05.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 14 May 2019 14:49:29 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp05.uk.ibm.com (192.168.101.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 14 May 2019 14:49:26 +0100 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x4EDnP5M44302542 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 14 May 2019 13:49:26 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C84F9A4051; Tue, 14 May 2019 13:49:25 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5D80CA404D; Tue, 14 May 2019 13:49:24 +0000 (GMT) Received: from laptop-ibm (unknown [9.145.182.16]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 14 May 2019 13:49:24 +0000 (GMT) Date: Tue, 14 May 2019 13:49:00 -0000 From: Philipp Rudo To: Thomas Caputi Cc: gdb@sourceware.org, Florian Weimer , Dave Anderson Subject: Re: Linux kernel debugging and other features In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit x-cbid: 19051413-0020-0000-0000-0000033C90EA X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19051413-0021-0000-0000-0000218F4D11 Message-Id: <20190514094921.3f08044a@laptop-ibm> X-IsSubscribed: yes X-SW-Source: 2019-05/txt/msg00023.txt.bz2 Hi Tom, that really sounds like a great piece of code you are having. A little back I worked at a similar project with focus on dump debugging on s390 [1]. Unfortunately the project never made it beyond the prototype... My approach back then was to include everything directly into GDB without using any python scripts whatsoever. All in all it worked quite well and allowed walking the task_struct and converting them to gdb threads, create backtrace of kernel threads and loading kernel modules as 'shared libraries'. However there where also some drawbacks. As it was directly included into gdb only elf dumps could be inspected. The virtual address translation was a little bit shaky as addresses in gdb are represented by simple ulongs without any information about the address space. Furthermore I concentrated on post-mortem dumps, exactly for the problems you describe below... Anyway, I'm thrilled to have a look at your code and see how you are solving those problems. I'm not sure if Florian's suggestion to add the code to crash is the proper way. Crash is based on a 7.6 gdb. So it's already quite old and probably doesn't have all the functionality you need. Nevertheless I CCed Dave Anderson, the crash maintainer. I guess he's interested in this discussion as well. Back when I was working on my project I didn't have the impression that there was any fundamental objections in adding the feature to gdb. It basically depends on your implementation to solve the problem. So for the next step, I'd suggest you simply post your patches to the gdb-patches list and we take the discussion from there. Thanks Philipp [1] https://sourceware.org/ml/gdb-patches/2018-03/msg00243.html On Mon, 13 May 2019 16:19:49 -0400 Thomas Caputi wrote: > Hello gdb, > > My name is Tom Caputi and I am a developer for ZFS on Linux. Recently, > I had the opportunity to work with some members of Delphix (another > major ZFS on Linux contributor) to build some debugging tools. > > When we started working on this project we were surprised to see how > close gdb was to supporting this kernel debugging natively. For live > systems, we were able to use the kernel's vmlinux file from the dbgsym > package (after mucking around for a bit with KASLR offsets) along with > /proc/kcore as a core file to inspect just about any non-local > variable on the system. > > For inspecting post-mortem kdumps we found that Jeff Mahoney had > already been working on this > (https://github.com/jeffmahoney/crash-python). Kdump files are > compressed and use a different on-disk format from regular core files, > but he was able to create a new "kdump" target type to support that. > His work also included code that allowed us to load the symbols for > kernel modules with their correct offsets. > > Jeff also has a python script that was able to parse out Linux's list > of task_struct structures (which represent all threads on the system > threads) and hand them to gdb. This allowed us to switch threads and > view stack traces with function arguments just as we could when using > gdb to debug a userspace program. > > On top of all of this, members of the Delphix team were able to put > together some code to allow some custom gdb sub-commands (written in > python) to be piped together comparable to the way commands can be > piped together in bash. By doing this we were able to put together a > few relatively simply commands to get some really powerful debugging > output. > > Currently, all of this is still in the proof-of-concept stage, but I > think both Datto (my company) and Delphix would like to look to the > next steps to get these improvements integrated upstream and > stabilized. We think these could be a huge improvement to the current > situation of debugging any code in the Linux kernel. However, there > are some sticky bits that we would like to discuss if the gdb > community is interested in these changes: > > 1) The kdumpfile support currently requires a few custom patches added > to gdb that allow a user to create a custom target in python. The > kdumpfile target is then implemented as a python module that calls out > to libkdumpfile (written in c). I'm not sure if this is the desired > implementation of this feature. If it is not, could we get some > pointers for how we could add this support to gdb? > > 2) The /proc/kcore file *looks* like a core file, but it is constantly > changing underneath us as the live system changes. When debugging code > we had issues where values that should be changing were cached and > appeared to remain static. We were able to reduce the gdb cache size > to 2 bytes (I think) by running 'set stack-cache off; set code-cache > off; set dcache size 1; set dcache line-size 2', but this still > results in (at least) the last variable you inspected being cached > until you look at something else. Is there a way we can completely > disable the dcache? > > 3) We aren't 100% sure where all of the new code belongs. The > ZFS-specific debugging commands we can definitely keep in the ZFS > repository, but the sub-command piping infrastructure could be useful > to anyone using gdb. We're also not really sure where the scripts that > parse out kernel structures (for things like threads and per-cpu > variables) should end up. > > Please let us know if you are interested in any of these changes and > let us know what some good next steps would be.