* Multi-threaded dwarf parsing @ 2016-02-24 2:45 Simon Marchi 2016-02-24 11:06 ` Pedro Alves 0 siblings, 1 reply; 11+ messages in thread From: Simon Marchi @ 2016-02-24 2:45 UTC (permalink / raw) To: gdb; +Cc: tromey Hi all, When debugging large programs, simply loading the binary in gdb can take a significant amount of time. I was wondering if the dwarf parsing (building partial and/or full symtabs, I suppose) could be a good candidate for parallelization. I did some quick checks to determine that, at least when reading from my SSD drive, the operation is not IO-bound. Also, according to my limited understanding of the Dwarf format, it seems like the compilation units DIEs are entities that could be processed independently. These two facts, if we assume they are true, suggest that there is a good potential for performance gain here. I couldn't find anything on the mailing list about that, please point out any discussion I might have missed. I found (and it was a very good surprise) this branch by Tom Tromey: https://github.com/tromey/gdb/tree/threaded-dwarf-reader According to his description (from https://github.com/tromey/gdb/wiki): "I think it doesn't help any real-world case". I'd like to ask you directly, Tom: now that you debug Firefox (i.e. a quite large program) daily with gdb, are you still of the same opinion? Of course, I'm also interested in what others have to say about that. Is it something that would have value, you think? Also, since not so long ago, LLDB does it. Apparently, it "can drastically incrase the speed of loading debug info" (sic). If it's good for LLDB, I don't see why it wouldn't be good for GDB. Ref: http://blog.llvm.org/2015/10/llvm-weekly-95-oct-26th-2015.html So, in a word, are there any gotchas or good reasons not do take this path? Simon ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multi-threaded dwarf parsing 2016-02-24 2:45 Multi-threaded dwarf parsing Simon Marchi @ 2016-02-24 11:06 ` Pedro Alves 2016-02-24 15:30 ` Tom Tromey 0 siblings, 1 reply; 11+ messages in thread From: Pedro Alves @ 2016-02-24 11:06 UTC (permalink / raw) To: Simon Marchi, gdb, Tom Tromey [Updated Tom's address] On 02/24/2016 02:45 AM, Simon Marchi wrote: > Hi all, > > When debugging large programs, simply loading the binary in gdb can take > a significant amount of time. I was wondering if the dwarf parsing > (building partial and/or full symtabs, I suppose) could be a good > candidate for parallelization. I did some quick checks to determine > that, at least when reading from my SSD drive, the operation is not > IO-bound. Also, according to my limited understanding of the Dwarf > format, it seems like the compilation units DIEs are entities that could > be processed independently. These two facts, if we assume they are > true, suggest that there is a good potential for performance gain here. > > I couldn't find anything on the mailing list about that, please point > out any discussion I might have missed. > > I found (and it was a very good surprise) this branch by Tom Tromey: > > https://github.com/tromey/gdb/tree/threaded-dwarf-reader > > According to his description (from https://github.com/tromey/gdb/wiki): > "I think it doesn't help any real-world case". I'd like to ask you > directly, Tom: now that you debug Firefox (i.e. a quite large program) > daily with gdb, are you still of the same opinion? Of course, I'm also > interested in what others have to say about that. Is it something that > would have value, you think? Making GDB load debug info faster, and making it take advantage of the multiple cores in most host machines nowadays definitely adds value. ( I'd also like to get threads into GDB for other reasons, so this would be a good trojan. Oh, whoops, did I say that out loud? :-) ) > > Also, since not so long ago, LLDB does it. Apparently, it "can > drastically incrase the speed of loading debug info" (sic). If it's > good for LLDB, I don't see why it wouldn't be good for GDB. > Ref: http://blog.llvm.org/2015/10/llvm-weekly-95-oct-26th-2015.html > > So, in a word, are there any gotchas or good reasons not do take this > path? The obvious gotchas are of course all the globals, and coming up with fine enough locking granularity that threads actually do run in parallel. Thanks, Pedro Alves ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multi-threaded dwarf parsing 2016-02-24 11:06 ` Pedro Alves @ 2016-02-24 15:30 ` Tom Tromey 2016-02-24 16:43 ` Simon Marchi 0 siblings, 1 reply; 11+ messages in thread From: Tom Tromey @ 2016-02-24 15:30 UTC (permalink / raw) To: Pedro Alves; +Cc: Simon Marchi, gdb, Tom Tromey >> According to his description (from https://github.com/tromey/gdb/wiki): >> "I think it doesn't help any real-world case". I'd like to ask you >> directly, Tom: now that you debug Firefox (i.e. a quite large program) >> daily with gdb, are you still of the same opinion? Of course, I'm also >> interested in what others have to say about that. Is it something that >> would have value, you think? It's been a while since I thought about that branch. I think it helps some scenarios, but maybe not as many as you'd like. In fact, I think it doesn't help the two of the three most typical ways I debug Firefox. (I realize this may not apply directly to your idea of reading each CU independently; this is just the state of that branch.) 1. Run Firefox, then attach. Here it is pretty normal for the attach to interrupt Firefox somewhere in libxul.so -- the largest library (so much larger that it is the only one that causes a noticeable pause at gdb startup). But, it seems to me that stopping somewhere in libxul.so should probably cause its debuginfo to be read. 2. Start gdb, set a breakpoint, then run Firefox. Here debuginfo for every library must be read in order to set the breakpoint correctly. The third scenario, which would be helped, is: 3. Start gdb, run Firefox, and try to reproduce a crash. In this situation gdb could read the debuginfo in the background and everything would work nicely. That said, I think my branch might have helped a tiny bit with scenario #1, because it prioritized the largest files when reading debuginfo. So, libxul.so would generally be read a bit earlier than it is now. Reading each CU independently seems like a good idea to me. I think it will stumble into various problems inside gdb, but I'd guess they are all surmountable with enough work. I think this could help with scenario #1. The ideal situation here would be to read just the CU (or CUs?) covering the stop address; then lazily read more as needed for types and such. I suppose it could also help #2 if enough parallelism is there to be had, though I'm a bit skeptical. >> So, in a word, are there any gotchas or good reasons not do take this >> path? Pedro> The obvious gotchas are of course all the globals, and coming up with Pedro> fine enough locking granularity that threads actually do run in parallel. I think the gotcha situation got worse since I wrote my patch. Now the DWARF reader can call into the type-printing system, which it didn't before. It wasn't clear to me that this was safe. ISTR there was some other change along these lines -- the DWARF reader calling out to some gdb module that it previously did not -- but I can't remember what it was any more. The DWARF reader also has many more modes (debug_types, dwz, dwo/dwp) than it did back then. So, this will require some careful auditing. FWIW my threading patches were written during my time at Red Hat and so you can use any part of that series without needing any paperwork from me. Tom ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multi-threaded dwarf parsing 2016-02-24 15:30 ` Tom Tromey @ 2016-02-24 16:43 ` Simon Marchi 2016-02-24 19:50 ` Tom Tromey 2016-02-24 20:25 ` Jan Kratochvil 0 siblings, 2 replies; 11+ messages in thread From: Simon Marchi @ 2016-02-24 16:43 UTC (permalink / raw) To: Tom Tromey; +Cc: Pedro Alves, gdb On 2016-02-24 10:30, Tom Tromey wrote: > It's been a while since I thought about that branch. > > I think it helps some scenarios, but maybe not as many as you'd like. > In fact, I think it doesn't help the two of the three most typical ways > I debug Firefox. (I realize this may not apply directly to your idea > of > reading each CU independently; this is just the state of that branch.) > > 1. Run Firefox, then attach. > > Here it is pretty normal for the attach to interrupt Firefox > somewhere in libxul.so -- the largest library (so much larger that > it > is the only one that causes a noticeable pause at gdb startup). > > But, it seems to me that stopping somewhere in libxul.so should > probably cause its debuginfo to be read. > > 2. Start gdb, set a breakpoint, then run Firefox. > > Here debuginfo for every library must be read in order to set the > breakpoint correctly. > > > The third scenario, which would be helped, is: > > 3. Start gdb, run Firefox, and try to reproduce a crash. In this > situation gdb could read the debuginfo in the background and > everything would work nicely. > > > That said, I think my branch might have helped a tiny bit with scenario > #1, because it prioritized the largest files when reading debuginfo. > So, libxul.so would generally be read a bit earlier than it is now. > > Reading each CU independently seems like a good idea to me. I think it > will stumble into various problems inside gdb, but I'd guess they are > all surmountable with enough work. Indeed, we probably had different, but not incompatible ideas of "threaded". Just to make sure I understand correctly: instead of blocking on the psymtabs creation at startup (in elf_symfile_read), you offload that to worker threads and carry on. If you happen to need the information and it's not ready yet, then the main code will have to block until the corresponding task is complete (dwarf2_require_psymtabs). However, in each worker thread, each objfile is still processed sequentially. So if you are waiting for libxul.so's debug info to be ready (such as in #1), it won't be ready any faster. Is that right? My view of the parallelism was that when reading an objfile's debug info, the main thread would offload chunks of work (a chunk == a CU) to the worker threads, but wait for all of them to be done before continuing. So it would still be blocking on the psymtab creation, but it would block for a shorter time (divided by the number of threads/cores, in an ideal world). It's just replacing a serial algorithm by a parallel one, but it would be mostly transparent to the rest of gdb. I hadn't thought of reading the info in the background, but I like the fact that it can get the user to a prompt faster. And I think these two forms of parallelism are not mutually exclusive, we could very well read CUs in parallel, in the background. > I think this could help with scenario #1. The ideal situation here > would be to read just the CU (or CUs?) covering the stop address; then > lazily read more as needed for types and such. > > I suppose it could also help #2 if enough parallelism is there to be > had, though I'm a bit skeptical. I think that reading CUs in parallel would help pretty much any use case where you are waiting for psymtabs to be created, it could reduce that wait time. >>> So, in a word, are there any gotchas or good reasons not do take this >>> path? > > Pedro> The obvious gotchas are of course all the globals, and coming up > with > Pedro> fine enough locking granularity that threads actually do run in > parallel. > > I think the gotcha situation got worse since I wrote my patch. > > Now the DWARF reader can call into the type-printing system, which it > didn't before. It wasn't clear to me that this was safe. ISTR there > was some other change along these lines -- the DWARF reader calling out > to some gdb module that it previously did not -- but I can't remember > what it was any more. > > The DWARF reader also has many more modes (debug_types, dwz, dwo/dwp) > than it did back then. So, this will require some careful auditing. Yes, I'm sure the reality is way more complicated than the image I have in my head at the moment :). > FWIW my threading patches were written during my time at Red Hat and so > you can use any part of that series without needing any paperwork from > me. Great, thanks! Simon ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multi-threaded dwarf parsing 2016-02-24 16:43 ` Simon Marchi @ 2016-02-24 19:50 ` Tom Tromey 2016-02-24 20:25 ` Jan Kratochvil 1 sibling, 0 replies; 11+ messages in thread From: Tom Tromey @ 2016-02-24 19:50 UTC (permalink / raw) To: Simon Marchi; +Cc: Tom Tromey, Pedro Alves, gdb Simon> Just to make sure I understand correctly: instead of blocking on Simon> the psymtabs creation at startup (in elf_symfile_read), you Simon> offload that to worker threads and carry on. If you happen to Simon> need the information and it's not ready yet, then the main code Simon> will have to block until the corresponding task is complete Simon> (dwarf2_require_psymtabs). That's correct. Simon> However, in each worker thread, each objfile is still processed Simon> sequentially. So if you are waiting for libxul.so's debug info Simon> to be ready (such as in #1), it won't be ready any faster. Is Simon> that right? Yes, each task constructs the psymtabs for an entire objfile. Simon> My view of the parallelism was that when reading an objfile's Simon> debug info, the main thread would offload chunks of work (a chunk Simon> == a CU) to the worker threads, but wait for all of them to be Simon> done before continuing. So it would still be blocking on the Simon> psymtab creation, but it would block for a shorter time (divided Simon> by the number of threads/cores, in an ideal world). It's just Simon> replacing a serial algorithm by a parallel one, but it would be Simon> mostly transparent to the rest of gdb. Yeah. This sounds doable in the abstract; though of course details matter. The DWARF reader has a lot of per-objfile state that would have to be split up (ideally) or locked. And there is stuff like buildsym.h, which is full of globals for no good reason. Simon> I hadn't thought of reading the info in the background, but I Simon> like the fact that it can get the user to a prompt faster. And I Simon> think these two forms of parallelism are not mutually exclusive, Simon> we could very well read CUs in parallel, in the background. I agree. Tom ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multi-threaded dwarf parsing 2016-02-24 16:43 ` Simon Marchi 2016-02-24 19:50 ` Tom Tromey @ 2016-02-24 20:25 ` Jan Kratochvil 2016-02-24 20:37 ` Simon Marchi ` (2 more replies) 1 sibling, 3 replies; 11+ messages in thread From: Jan Kratochvil @ 2016-02-24 20:25 UTC (permalink / raw) To: Simon Marchi; +Cc: Tom Tromey, Pedro Alves, gdb On Wed, 24 Feb 2016 17:43:03 +0100, Simon Marchi wrote: > instead of blocking on the psymtabs creation at startup [...] > then the main code will have to block until the corresponding task is > complete (dwarf2_require_psymtabs). If really your concern are psymtabs then use Tom's .gdb_index: gdb/contrib/gdb-add-index.sh With .gdb_index GDB still has startup performance problems during full CU expansions, that is struct symtab and struct symbol. That happens with C++ inferiors which have very interlinked CUs and thus expanding one CU means for GDB expanding 100+ CUs due to the inter-type dependencies which cannot be left opaque in such cases. And as each C++ CU is usually very large... Jan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multi-threaded dwarf parsing 2016-02-24 20:25 ` Jan Kratochvil @ 2016-02-24 20:37 ` Simon Marchi 2016-02-24 21:28 ` Jan Kratochvil 2016-02-24 21:10 ` Pedro Alves 2016-02-25 3:31 ` Tom Tromey 2 siblings, 1 reply; 11+ messages in thread From: Simon Marchi @ 2016-02-24 20:37 UTC (permalink / raw) To: Jan Kratochvil; +Cc: Tom Tromey, Pedro Alves, gdb On 2016-02-24 15:25, Jan Kratochvil wrote: > On Wed, 24 Feb 2016 17:43:03 +0100, Simon Marchi wrote: >> instead of blocking on the psymtabs creation at startup > [...] >> then the main code will have to block until the corresponding task is >> complete (dwarf2_require_psymtabs). > > If really your concern are psymtabs then use Tom's .gdb_index: > gdb/contrib/gdb-add-index.sh > > With .gdb_index GDB still has startup performance problems during full > CU > expansions, that is struct symtab and struct symbol. That happens with > C++ > inferiors which have very interlinked CUs and thus expanding one CU > means for > GDB expanding 100+ CUs due to the inter-type dependencies which cannot > be left > opaque in such cases. And as each C++ CU is usually very large... What can cause CUs to be interlinked with each other? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multi-threaded dwarf parsing 2016-02-24 20:37 ` Simon Marchi @ 2016-02-24 21:28 ` Jan Kratochvil 0 siblings, 0 replies; 11+ messages in thread From: Jan Kratochvil @ 2016-02-24 21:28 UTC (permalink / raw) To: Simon Marchi; +Cc: Tom Tromey, Pedro Alves, gdb On Wed, 24 Feb 2016 21:37:24 +0100, Simon Marchi wrote: > What can cause CUs to be interlinked with each other? I did not remember, from what I am checking now it is due to dwz: https://sourceware.org/git/?p=dwz.git;a=blob;f=dwz.c That is a DWARF size reduction tool (by DWARF optimization, not by any compression). All the CUs get queued there due to its DW_AT_import: process_imported_unit_die()->maybe_queue_comp_unit() Without dwz I could not reproduce the queueing problem. IIRC there was some but I admit I may not remember it right. BTW expanding one CU is also not cheap, just its .debug_info part can be around 1MB: readelf -wi libwebkitgtk-1.0.so.0.5.2.debug|grep '^ *<0>'|perl -lne 'BEGIN{$l=0;} /^\s*<0><([0-9a-f]+)>/ or die;$x=eval "0x$1";print(($x-$l)." ".$_);$l=$x;'|sort -nr But that is a sub-second delay not much of a real problem. Jan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multi-threaded dwarf parsing 2016-02-24 20:25 ` Jan Kratochvil 2016-02-24 20:37 ` Simon Marchi @ 2016-02-24 21:10 ` Pedro Alves 2016-02-24 21:22 ` Jan Kratochvil 2016-02-25 3:31 ` Tom Tromey 2 siblings, 1 reply; 11+ messages in thread From: Pedro Alves @ 2016-02-24 21:10 UTC (permalink / raw) To: Jan Kratochvil, Simon Marchi; +Cc: Tom Tromey, gdb On 02/24/2016 08:25 PM, Jan Kratochvil wrote: > On Wed, 24 Feb 2016 17:43:03 +0100, Simon Marchi wrote: >> instead of blocking on the psymtabs creation at startup > [...] >> then the main code will have to block until the corresponding task is >> complete (dwarf2_require_psymtabs). > > If really your concern are psymtabs then use Tom's .gdb_index: > gdb/contrib/gdb-add-index.sh I think the index isn't so helpful if the big thing that takes a while to read/load is what you're changing in a edit/compile/debug cycle. Also, that script actually relies on gdb to read the debug info, intern it, and spit out the index. So if we gdb reads dwarf faster, then index generation itself becomes faster too. > > With .gdb_index GDB still has startup performance problems during full CU > expansions, that is struct symtab and struct symbol. That happens with C++ > inferiors which have very interlinked CUs and thus expanding one CU means for > GDB expanding 100+ CUs due to the inter-type dependencies which cannot be left > opaque in such cases. And as each C++ CU is usually very large... Sounds like something that could be sped up by reading CUs in parallel. Thanks, Pedro Alves ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multi-threaded dwarf parsing 2016-02-24 21:10 ` Pedro Alves @ 2016-02-24 21:22 ` Jan Kratochvil 0 siblings, 0 replies; 11+ messages in thread From: Jan Kratochvil @ 2016-02-24 21:22 UTC (permalink / raw) To: Pedro Alves; +Cc: Simon Marchi, Tom Tromey, gdb On Wed, 24 Feb 2016 22:10:46 +0100, Pedro Alves wrote: > On 02/24/2016 08:25 PM, Jan Kratochvil wrote: > > If really your concern are psymtabs then use Tom's .gdb_index: > > gdb/contrib/gdb-add-index.sh > > I think the index isn't so helpful if the big thing that takes a > while to read/load is what you're changing in a edit/compile/debug > cycle. I found it useful even during edit/compile/debug cycles. If one modifies an .h file the compilation step takes up to a few minutes anyway so that is a non-interactive step. Moreover it is done only once, one may debug it more times then etc. > Sounds like something that could be sped up by reading CUs in parallel. Yes; going to discuss it in another mail. Jan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multi-threaded dwarf parsing 2016-02-24 20:25 ` Jan Kratochvil 2016-02-24 20:37 ` Simon Marchi 2016-02-24 21:10 ` Pedro Alves @ 2016-02-25 3:31 ` Tom Tromey 2 siblings, 0 replies; 11+ messages in thread From: Tom Tromey @ 2016-02-25 3:31 UTC (permalink / raw) To: Jan Kratochvil; +Cc: Simon Marchi, Tom Tromey, Pedro Alves, gdb Jan> With .gdb_index GDB still has startup performance problems during Jan> full CU expansions, that is struct symtab and struct symbol. My branch "lazily-read-function-bodies" addressed this issue. It changed CU expansion to skip reading function bodies until needed. This was good for a decent speedup; my notes say ~40%. I didn't finish this branch, though -- it still needed a bit of work to expand a function when a by-address lookup was done. It's possible, but harder, to go even farther than this -- that is, unify symtabs and psymtabs and make CU expansion completely lazy. At one point I had a rather complicated plan for this. For what it's worth, in my current debugging, I do notice psymtab reading, but I never notice CU expansion. I'm not sure if I'm just lucky or if it's because the CU expansion problem is exacerbated by dwz, which I'm of course not using during development. Tom ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-02-25 3:31 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-02-24 2:45 Multi-threaded dwarf parsing Simon Marchi 2016-02-24 11:06 ` Pedro Alves 2016-02-24 15:30 ` Tom Tromey 2016-02-24 16:43 ` Simon Marchi 2016-02-24 19:50 ` Tom Tromey 2016-02-24 20:25 ` Jan Kratochvil 2016-02-24 20:37 ` Simon Marchi 2016-02-24 21:28 ` Jan Kratochvil 2016-02-24 21:10 ` Pedro Alves 2016-02-24 21:22 ` Jan Kratochvil 2016-02-25 3:31 ` Tom Tromey
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox