* [rfc] split up symtab.h
@ 2002-10-08 13:14 David Carlton
2002-10-08 13:54 ` Kevin Buettner
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: David Carlton @ 2002-10-08 13:14 UTC (permalink / raw)
To: gdb-patches; +Cc: Jim Blandy, Elena Zannoni
I'm sick of having to recompile half of GDB every time I touch
symtab.h. There's lots of different things in that file; it's
included in 137 different places (counting only the gdb directory, not
gdb/mi, etc.), but there's no one thing that it defines that is used
in more than 71 places, and a lot of things that it defines are used
in a lot fewer places than that.
Here's what is defined in symtab.h, together with the number of .c
files that each construct is mentioned in. (I've produced this
information with a naive use of grep/cut/uniq, so it could be off: the
main problem is that, say, every use of 'struct blockvector' counts as
a use of 'struct block'. Then again, what file would use 'struct
blockvector' without using 'struct block'?)
struct general_symbol_info (1)
struct minimal_symbol (71)
struct blockvector (13)
struct block (38)
struct range_list (2)
struct alias_list (2)
struct symbol (62)
struct partial_symbol (10)
struct sourcevector (0)
struct linetable_entry (6)
struct linetable (9)
struct source (0)
struct section_offsets (20)
struct symtab (65)
struct partial_symtab (16)
struct symtab_and_line (39)
struct symtabs_and_lines (9)
enum exception_event_kind (4)
struct exception_event_record (4)
struct symbol_search (1)
There are, of course, many function declarations in there as well.
Of course, these numbers are still pretty large, but having my
compilation numbers cut from 137 files to 38 or 62 files or whatever
would help a lot. (And yes, I realize that 137 isn't accurate: many
of the files are target-specific, so I'm not really recompiling 137
files every time I touch symtab.h.)
I haven't generated complete correlation data; some of what I have
generated is pretty interesting, though. For example, while it's not
surprising every file that mentions 'struct partial_symbol' also
mentions 'struct symbol', I was a little surprised to see that 41
files mention both 'struct symbol' and 'struct minimal_symbol', 21
files mention only the former, and 30 files mention only the latter.
Here's one possible way to split things up into new files:
gensym.h:
struct general_symbol_info (1)
minsyms.h:
struct minimal_symbol (71)
block.h:
struct blockvector (13)
struct block (38)
symbol.h:
struct range_list (2)
struct alias_list (2)
struct symbol (62)
namespace_enum (5)
enum address_class (4)
psymbol.h:
struct partial_symbol (10)
linetable.h:
struct linetable_entry (6)
struct linetable (9)
symtab.h:
struct symtab (65)
psymtab.h:
struct partial_symtab (16)
sal.h:
struct symtab_and_line (39)
struct symtabs_and_lines (9)
exception.h:
enum exception_event_kind (4)
struct exception_event_record (4)
section.h:
struct section_offsets (20)
Move to symtab.c:
struct symbol_search (1)
Delete entirely (where are these used?):
struct sourcevector (0)
struct source (0)
I haven't looked at where function declarations go; I expect that, in
practice, it'll be pretty obvious which ones go where. (Though there
are a few weirdos; where does {set_}main_name go?)
Anyways, if anybody else is similarly annoyed with symtab.h then I'll
try to split things up and make a more concrete RFA in a bit.
David Carlton
carlton@math.stanford.edu
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [rfc] split up symtab.h 2002-10-08 13:14 [rfc] split up symtab.h David Carlton @ 2002-10-08 13:54 ` Kevin Buettner 2002-10-08 15:05 ` David Carlton 2002-10-08 15:11 ` Michael Snyder 2002-10-18 14:12 ` Elena Zannoni 2 siblings, 1 reply; 7+ messages in thread From: Kevin Buettner @ 2002-10-08 13:54 UTC (permalink / raw) To: David Carlton, gdb-patches; +Cc: Jim Blandy, Elena Zannoni On Oct 8, 1:14pm, David Carlton wrote: > I'm sick of having to recompile half of GDB every time I touch > symtab.h. There's lots of different things in that file; it's > included in 137 different places (counting only the gdb directory, not > gdb/mi, etc.), but there's no one thing that it defines that is used > in more than 71 places, and a lot of things that it defines are used > in a lot fewer places than that. [...] > Anyways, if anybody else is similarly annoyed with symtab.h then I'll > try to split things up and make a more concrete RFA in a bit. Aside from the build time issue, are there other reasons why splitting up symtab.h is desirable? Here are several reasons for not splitting it: 1) The list of includes for many .c files will (I suspect) grow quite a bit. If it turns out that you'll be replacing one #include statement with five or size (per source file), I can't really see that making the split was an advantage. 2) One could argue that modifying symtab.h *should* be a heavy weight operation. I.e, you're modifying something that's at the very heart of gdb and you need to take great care. 3) Makefile.in maintenance becomes harder due to the larger number of header files. I should note that I don't find any of the above reasons to be overly compelling. I just think that we need a better reason for making such a split than the build time consideration. Kevin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc] split up symtab.h 2002-10-08 13:54 ` Kevin Buettner @ 2002-10-08 15:05 ` David Carlton 0 siblings, 0 replies; 7+ messages in thread From: David Carlton @ 2002-10-08 15:05 UTC (permalink / raw) To: Kevin Buettner; +Cc: gdb-patches, Jim Blandy, Elena Zannoni On Tue, 8 Oct 2002 13:54:46 -0700, Kevin Buettner <kevinb@redhat.com> said: > Aside from the build time issue, are there other reasons why > splitting up symtab.h is desirable? Honestly, I have a hard time answering this question: it's easy enough to consider practical considerations, but the philosophical side of things isn't so clear to me. I guess that, if you pinned me down, I'd say that I start with a default assumption that an include file should normally correspond to a single construct, which typically means a single structure. But I'm not sure I really believe that: I haven't thought enough about its implications. To look at it another way: why should 'struct minimal_symbol' and 'struct linetable' both be in the same header file? The best answer that I can come up with is: * 'struct symtab' has a member that is a pointer to a 'struct linetable'. * 'struct symtab' also stores a bunch of 'struct symbol's. (Indirectly: via 'struct blockvector' and then 'struct block'.) * 'struct minimal_symbol' is kind of like 'struct symbol'. That, to me, is not a very good reason for both of those structs to be in the same header file. > Here are several reasons for not splitting it: > 1) The list of includes for many .c files will (I suspect) grow > quite a bit. If it turns out that you'll be replacing one > #include statement with five or size (per source file), I > can't really see that making the split was an advantage. I honestly don't know what the average number of includes that each #include "symtab.h" would turn into is. Five might be right, or it might be just a tad high. (It also depends on whether or not the #include files for minimal_symbol, symbol, and partial_symbol are allowed to include the one for general_symbol_info.) I've generated various correlations between uses of different structures (I knew my number theory background would come in useful somehow!), but they don't give me a clear answer to this. One example is that minimal_symbol, symtab, and symbol are the most commonly used structures in symtab.h, but while 111 files refer to at least one of these structures, only 29 refer to all three. (And it's clear that the conceptual link from minimal_symbol to symtab passes via symbol: only 4 files refer to minimal_symbol and symtab but _not_ to symbol, and only 8 files refer only to symbol but not to either of minimal_symbol or symtab. But there are tons of files that mention either minimal_symbol or symtab but not both.) I'm certainly not looking forward to changing existing files. Having said that, I don't think it would impose a large future maintenance burden: if somebody, say, adds a function to an existing file that calls one of the new headers to have to be pulled in, the compiler will let that person know, and it's easy enough to use grep to figure out which file to include. > 2) One could argue that modifying symtab.h *should* be a heavy weight > operation. I.e, you're modifying something that's at the very > heart of gdb and you need to take great care. This is, to me, a really important issue. But I'm honestly not sure whether this argues for or against breaking up symtab.h. For example, since symbol stuff is so important, I like to make small changes and recompile GDB after each change (and even to run a subset of the test suite after each change) to help reassure me that I didn't screw anything up. And having long compilation times really works against me there: if it takes longer to recompile the program than to make the change, then I'll wait until I've made several changes before recompiling. Or, to give another example, sometimes I make a change, and then after using it for a little while, I realize that the change is a little subtler than I first thought. So I want to include a comment that explains the situation a little better: but, just by including a comment, I've doomed myself to a large recompilation. (I certainly wouldn't fix a typo in a comment in symtab.h, even if it's one that I've made myself, unless I'm doing other changes to symtab.h: it's just not worth the pain.) Of course, one answer is to do the thinking before making the change in the first place, and that's the best situation: but, alas, I'm not a good enough programmer to always be able to forsee the implications of my changes that way. (Obviously I should give changes time to settle before submitting them as an RFA, but that's another matter entirely.) So, basically, the long compilation times are making it more painful to follow what seems to me like good software engineering practices. > 3) Makefile.in maintenance becomes harder due to the larger > number of header files. I guess; I don't have a lot of experience with that. I suppose the problem there is that, if Makefile.in gets out of sync, then you might not recompile when you're supposed to, and, unlike your first point, this could lead to problems that programmers were unaware of. That would be very bad. Too bad there's no way to generate those dependencies automatically... > I should note that I don't find any of the above reasons to be > overly compelling. I just think that we need a better reason for > making such a split than the build time consideration. Yeah, I totally agree. I think that there should be some sort of philosophical justification for when to split a header file, and I certainly wouldn't argue with you when you say that changes to core structures in symtab.h shouldn't be made lightly. David Carlton carlton@math.stanford.edu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc] split up symtab.h 2002-10-08 13:14 [rfc] split up symtab.h David Carlton 2002-10-08 13:54 ` Kevin Buettner @ 2002-10-08 15:11 ` Michael Snyder 2002-10-08 15:22 ` David Carlton 2002-10-18 14:12 ` Elena Zannoni 2 siblings, 1 reply; 7+ messages in thread From: Michael Snyder @ 2002-10-08 15:11 UTC (permalink / raw) To: David Carlton; +Cc: gdb-patches, Jim Blandy, Elena Zannoni David Carlton wrote: > > I'm sick of having to recompile half of GDB every time I touch > symtab.h. There's lots of different things in that file; it's > included in 137 different places (counting only the gdb directory, not > gdb/mi, etc.), but there's no one thing that it defines that is used > in more than 71 places, and a lot of things that it defines are used > in a lot fewer places than that. > > Here's what is defined in symtab.h, together with the number of .c > files that each construct is mentioned in. (I've produced this > information with a naive use of grep/cut/uniq, so it could be off: the > main problem is that, say, every use of 'struct blockvector' counts as > a use of 'struct block'. Then again, what file would use 'struct > blockvector' without using 'struct block'?) > > struct general_symbol_info (1) Careful. Struct general_symbol_info is mentioned in LOTS of places... indirectly, thru uses of the macros SYMBOL_NAME, SYMBOL_TYPE, etc. > struct minimal_symbol (71) > struct blockvector (13) > struct block (38) > struct range_list (2) > struct alias_list (2) > struct symbol (62) > struct partial_symbol (10) > struct sourcevector (0) > struct linetable_entry (6) > struct linetable (9) > struct source (0) > struct section_offsets (20) > struct symtab (65) > struct partial_symtab (16) > struct symtab_and_line (39) > struct symtabs_and_lines (9) > enum exception_event_kind (4) > struct exception_event_record (4) > struct symbol_search (1) > > There are, of course, many function declarations in there as well. > > Of course, these numbers are still pretty large, but having my > compilation numbers cut from 137 files to 38 or 62 files or whatever > would help a lot. (And yes, I realize that 137 isn't accurate: many > of the files are target-specific, so I'm not really recompiling 137 > files every time I touch symtab.h.) > > I haven't generated complete correlation data; some of what I have > generated is pretty interesting, though. For example, while it's not > surprising every file that mentions 'struct partial_symbol' also > mentions 'struct symbol', I was a little surprised to see that 41 > files mention both 'struct symbol' and 'struct minimal_symbol', 21 > files mention only the former, and 30 files mention only the latter. > > Here's one possible way to split things up into new files: > > gensym.h: > > struct general_symbol_info (1) > > minsyms.h: > > struct minimal_symbol (71) > > block.h: > > struct blockvector (13) > struct block (38) > > symbol.h: > > struct range_list (2) > struct alias_list (2) > struct symbol (62) > namespace_enum (5) > enum address_class (4) > > psymbol.h: > > struct partial_symbol (10) > > linetable.h: > > struct linetable_entry (6) > struct linetable (9) > > symtab.h: > > struct symtab (65) > > psymtab.h: > > struct partial_symtab (16) > > sal.h: > > struct symtab_and_line (39) > struct symtabs_and_lines (9) > > exception.h: > > enum exception_event_kind (4) > struct exception_event_record (4) > > section.h: > > struct section_offsets (20) > > Move to symtab.c: > > struct symbol_search (1) > > Delete entirely (where are these used?): > > struct sourcevector (0) > struct source (0) > > I haven't looked at where function declarations go; I expect that, in > practice, it'll be pretty obvious which ones go where. (Though there > are a few weirdos; where does {set_}main_name go?) > > Anyways, if anybody else is similarly annoyed with symtab.h then I'll > try to split things up and make a more concrete RFA in a bit. > > David Carlton > carlton@math.stanford.edu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc] split up symtab.h 2002-10-08 15:11 ` Michael Snyder @ 2002-10-08 15:22 ` David Carlton 0 siblings, 0 replies; 7+ messages in thread From: David Carlton @ 2002-10-08 15:22 UTC (permalink / raw) To: Michael Snyder; +Cc: gdb-patches, Jim Blandy, Elena Zannoni On Tue, 08 Oct 2002 15:09:58 -0700, Michael Snyder <msnyder@redhat.com> said: > David Carlton wrote: >> struct general_symbol_info (1) > Careful. Struct general_symbol_info is mentioned in LOTS of > places... indirectly, thru uses of the macros SYMBOL_NAME, > SYMBOL_TYPE, etc. Right, that particular count is totally misleading. Aside from the macros that you mentioned, the definitions of struct {minimal_,partial_,}symbol all need to have the definition of struct general_symbol_info available as well. So there would be nontrivial dependencies among the header files that I was proposing. (I _think_ the only nontrivial dependencies arise from 'struct general_symbol_info' and from enums, but I could be wrong.) Personally, I'd be quite tempted to have the header files for minimal_symbol, symbol, and partial_symbol all include the header file for general_symbol_info; I realize that GDB prefers to avoid that, but here is a situation where the usual substitute, namely opaque declarations of structures, doesn't work. Also, the correct location of namespace_enum isn't clear to me; too bad C doesn't support opaque declarations of enums. And the exact placements of partial_ stuff isn't clear to me: it seems plausible to me that 'struct partial_symbol' should either be in the same include file as 'struct symbol' or in the same file as 'struct partial_symtab', but which? David Carlton carlton@math.stanford.edu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc] split up symtab.h 2002-10-08 13:14 [rfc] split up symtab.h David Carlton 2002-10-08 13:54 ` Kevin Buettner 2002-10-08 15:11 ` Michael Snyder @ 2002-10-18 14:12 ` Elena Zannoni 2002-10-18 14:52 ` David Carlton 2 siblings, 1 reply; 7+ messages in thread From: Elena Zannoni @ 2002-10-18 14:12 UTC (permalink / raw) To: David Carlton; +Cc: gdb-patches, Jim Blandy, Elena Zannoni David Carlton writes: > I'm sick of having to recompile half of GDB every time I touch > symtab.h. There's lots of different things in that file; it's > included in 137 different places (counting only the gdb directory, not > gdb/mi, etc.), but there's no one thing that it defines that is used > in more than 71 places, and a lot of things that it defines are used > in a lot fewer places than that. > I see your point. But I have to agree with Kevin about the tangled web we could be weaving. Maybe we could start with a little cleanup. Are all the things exported by symtab.h actually used? And vice versa, are the files including symtab.h actually needing it? I found a few times that some dependencies were left in place after some code which needed them had been deleted, etc. Elena > Here's what is defined in symtab.h, together with the number of .c > files that each construct is mentioned in. (I've produced this > information with a naive use of grep/cut/uniq, so it could be off: the > main problem is that, say, every use of 'struct blockvector' counts as > a use of 'struct block'. Then again, what file would use 'struct > blockvector' without using 'struct block'?) > > struct general_symbol_info (1) > struct minimal_symbol (71) > struct blockvector (13) > struct block (38) > struct range_list (2) > struct alias_list (2) > struct symbol (62) > struct partial_symbol (10) > struct sourcevector (0) > struct linetable_entry (6) > struct linetable (9) > struct source (0) > struct section_offsets (20) > struct symtab (65) > struct partial_symtab (16) > struct symtab_and_line (39) > struct symtabs_and_lines (9) > enum exception_event_kind (4) > struct exception_event_record (4) > struct symbol_search (1) > > There are, of course, many function declarations in there as well. > > Of course, these numbers are still pretty large, but having my > compilation numbers cut from 137 files to 38 or 62 files or whatever > would help a lot. (And yes, I realize that 137 isn't accurate: many > of the files are target-specific, so I'm not really recompiling 137 > files every time I touch symtab.h.) > > I haven't generated complete correlation data; some of what I have > generated is pretty interesting, though. For example, while it's not > surprising every file that mentions 'struct partial_symbol' also > mentions 'struct symbol', I was a little surprised to see that 41 > files mention both 'struct symbol' and 'struct minimal_symbol', 21 > files mention only the former, and 30 files mention only the latter. > > Here's one possible way to split things up into new files: > > gensym.h: > > struct general_symbol_info (1) > > minsyms.h: > > struct minimal_symbol (71) > > block.h: > > struct blockvector (13) > struct block (38) > > symbol.h: > > struct range_list (2) > struct alias_list (2) > struct symbol (62) > namespace_enum (5) > enum address_class (4) > > psymbol.h: > > struct partial_symbol (10) > > linetable.h: > > struct linetable_entry (6) > struct linetable (9) > > symtab.h: > > struct symtab (65) > > psymtab.h: > > struct partial_symtab (16) > > sal.h: > > struct symtab_and_line (39) > struct symtabs_and_lines (9) > > exception.h: > > enum exception_event_kind (4) > struct exception_event_record (4) > > section.h: > > struct section_offsets (20) > > Move to symtab.c: > > struct symbol_search (1) > > Delete entirely (where are these used?): > > struct sourcevector (0) > struct source (0) > > I haven't looked at where function declarations go; I expect that, in > practice, it'll be pretty obvious which ones go where. (Though there > are a few weirdos; where does {set_}main_name go?) > > > Anyways, if anybody else is similarly annoyed with symtab.h then I'll > try to split things up and make a more concrete RFA in a bit. > > David Carlton > carlton@math.stanford.edu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc] split up symtab.h 2002-10-18 14:12 ` Elena Zannoni @ 2002-10-18 14:52 ` David Carlton 0 siblings, 0 replies; 7+ messages in thread From: David Carlton @ 2002-10-18 14:52 UTC (permalink / raw) To: Elena Zannoni; +Cc: gdb-patches, Jim Blandy On Fri, 18 Oct 2002 17:10:04 -0400, Elena Zannoni <ezannoni@redhat.com> said: > I see your point. But I have to agree with Kevin about the tangled > web we could be weaving. Maybe we could start with a little > cleanup. Are all the things exported by symtab.h actually used? And > vice versa, are the files including symtab.h actually needing it? I > found a few times that some dependencies were left in place after > some code which needed them had been deleted, etc. I'll look into that. Actually, I already have one candidate for cleanup: if you do "grep 'struct source' *.h *.c", it sure looks like struct source and struct sourcevector aren't needed. (And if you add ChangeLog* to your list of files to grep, it looks even more suspicious!) So I think that's the obvious candidate for deletion from symtab.h. Also, I think that struct symbol_search could be moved to symtab.c. (Though I'd want to look into how that's being used, because I could imagine that some such structure could be useful for other files in the future.) And I'll look into the reverse problem, as you suggest, namely files that might be unnecessarily including symtab.h. Also, another thing that I thought of is that it's silly to propose breaking it up all at once. Right now, for example, I'm fiddling with 'struct block' in carlton_dictionary-branch to see how it would be a good idea to modify it to handle namespace lookup correctly. (And, unfortunately, each time I think I understand C++ name lookup rules, I get surprised by the compiler or the standard, and I realize that my data structures aren't good enough; it's definitely a good thing that I'm doing this on a branch...) So maybe, as an experiment, I'll try moving 'struct block' and 'struct blockvector' into a file "block.h", and I'll try to be careful about getting the #includes and Makefile.in just right; if it turns out to be useful, then I'll consider sending an RFA to the list. Getting just that one bit right would be a lot easier than splitting symtab.h up into 10 files, or whatever I proposed in my RFC: splitting it up all at once is just asking for trouble. David Carlton carlton@math.stanford.edu ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2002-10-18 21:52 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-10-08 13:14 [rfc] split up symtab.h David Carlton 2002-10-08 13:54 ` Kevin Buettner 2002-10-08 15:05 ` David Carlton 2002-10-08 15:11 ` Michael Snyder 2002-10-08 15:22 ` David Carlton 2002-10-18 14:12 ` Elena Zannoni 2002-10-18 14:52 ` David Carlton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox