* [rfc] split up symtab.h
@ 2002-10-08 13:14 David Carlton
2002-10-08 13:54 ` Kevin Buettner
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: David Carlton @ 2002-10-08 13:14 UTC (permalink / raw)
To: gdb-patches; +Cc: Jim Blandy, Elena Zannoni
I'm sick of having to recompile half of GDB every time I touch
symtab.h. There's lots of different things in that file; it's
included in 137 different places (counting only the gdb directory, not
gdb/mi, etc.), but there's no one thing that it defines that is used
in more than 71 places, and a lot of things that it defines are used
in a lot fewer places than that.
Here's what is defined in symtab.h, together with the number of .c
files that each construct is mentioned in. (I've produced this
information with a naive use of grep/cut/uniq, so it could be off: the
main problem is that, say, every use of 'struct blockvector' counts as
a use of 'struct block'. Then again, what file would use 'struct
blockvector' without using 'struct block'?)
struct general_symbol_info (1)
struct minimal_symbol (71)
struct blockvector (13)
struct block (38)
struct range_list (2)
struct alias_list (2)
struct symbol (62)
struct partial_symbol (10)
struct sourcevector (0)
struct linetable_entry (6)
struct linetable (9)
struct source (0)
struct section_offsets (20)
struct symtab (65)
struct partial_symtab (16)
struct symtab_and_line (39)
struct symtabs_and_lines (9)
enum exception_event_kind (4)
struct exception_event_record (4)
struct symbol_search (1)
There are, of course, many function declarations in there as well.
Of course, these numbers are still pretty large, but having my
compilation numbers cut from 137 files to 38 or 62 files or whatever
would help a lot. (And yes, I realize that 137 isn't accurate: many
of the files are target-specific, so I'm not really recompiling 137
files every time I touch symtab.h.)
I haven't generated complete correlation data; some of what I have
generated is pretty interesting, though. For example, while it's not
surprising every file that mentions 'struct partial_symbol' also
mentions 'struct symbol', I was a little surprised to see that 41
files mention both 'struct symbol' and 'struct minimal_symbol', 21
files mention only the former, and 30 files mention only the latter.
Here's one possible way to split things up into new files:
gensym.h:
struct general_symbol_info (1)
minsyms.h:
struct minimal_symbol (71)
block.h:
struct blockvector (13)
struct block (38)
symbol.h:
struct range_list (2)
struct alias_list (2)
struct symbol (62)
namespace_enum (5)
enum address_class (4)
psymbol.h:
struct partial_symbol (10)
linetable.h:
struct linetable_entry (6)
struct linetable (9)
symtab.h:
struct symtab (65)
psymtab.h:
struct partial_symtab (16)
sal.h:
struct symtab_and_line (39)
struct symtabs_and_lines (9)
exception.h:
enum exception_event_kind (4)
struct exception_event_record (4)
section.h:
struct section_offsets (20)
Move to symtab.c:
struct symbol_search (1)
Delete entirely (where are these used?):
struct sourcevector (0)
struct source (0)
I haven't looked at where function declarations go; I expect that, in
practice, it'll be pretty obvious which ones go where. (Though there
are a few weirdos; where does {set_}main_name go?)
Anyways, if anybody else is similarly annoyed with symtab.h then I'll
try to split things up and make a more concrete RFA in a bit.
David Carlton
carlton@math.stanford.edu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc] split up symtab.h
2002-10-08 13:14 [rfc] split up symtab.h David Carlton
@ 2002-10-08 13:54 ` Kevin Buettner
2002-10-08 15:05 ` David Carlton
2002-10-08 15:11 ` Michael Snyder
2002-10-18 14:12 ` Elena Zannoni
2 siblings, 1 reply; 7+ messages in thread
From: Kevin Buettner @ 2002-10-08 13:54 UTC (permalink / raw)
To: David Carlton, gdb-patches; +Cc: Jim Blandy, Elena Zannoni
On Oct 8, 1:14pm, David Carlton wrote:
> I'm sick of having to recompile half of GDB every time I touch
> symtab.h. There's lots of different things in that file; it's
> included in 137 different places (counting only the gdb directory, not
> gdb/mi, etc.), but there's no one thing that it defines that is used
> in more than 71 places, and a lot of things that it defines are used
> in a lot fewer places than that.
[...]
> Anyways, if anybody else is similarly annoyed with symtab.h then I'll
> try to split things up and make a more concrete RFA in a bit.
Aside from the build time issue, are there other reasons why splitting
up symtab.h is desirable?
Here are several reasons for not splitting it:
1) The list of includes for many .c files will (I suspect) grow
quite a bit. If it turns out that you'll be replacing one
#include statement with five or size (per source file), I
can't really see that making the split was an advantage.
2) One could argue that modifying symtab.h *should* be a heavy weight
operation. I.e, you're modifying something that's at the very
heart of gdb and you need to take great care.
3) Makefile.in maintenance becomes harder due to the larger
number of header files.
I should note that I don't find any of the above reasons to be
overly compelling. I just think that we need a better reason
for making such a split than the build time consideration.
Kevin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc] split up symtab.h
2002-10-08 13:54 ` Kevin Buettner
@ 2002-10-08 15:05 ` David Carlton
0 siblings, 0 replies; 7+ messages in thread
From: David Carlton @ 2002-10-08 15:05 UTC (permalink / raw)
To: Kevin Buettner; +Cc: gdb-patches, Jim Blandy, Elena Zannoni
On Tue, 8 Oct 2002 13:54:46 -0700, Kevin Buettner <kevinb@redhat.com> said:
> Aside from the build time issue, are there other reasons why
> splitting up symtab.h is desirable?
Honestly, I have a hard time answering this question: it's easy enough
to consider practical considerations, but the philosophical side of
things isn't so clear to me. I guess that, if you pinned me down, I'd
say that I start with a default assumption that an include file should
normally correspond to a single construct, which typically means a
single structure. But I'm not sure I really believe that: I haven't
thought enough about its implications.
To look at it another way: why should 'struct minimal_symbol' and
'struct linetable' both be in the same header file? The best answer
that I can come up with is:
* 'struct symtab' has a member that is a pointer to a 'struct
linetable'.
* 'struct symtab' also stores a bunch of 'struct symbol's.
(Indirectly: via 'struct blockvector' and then 'struct block'.)
* 'struct minimal_symbol' is kind of like 'struct symbol'.
That, to me, is not a very good reason for both of those structs to be
in the same header file.
> Here are several reasons for not splitting it:
> 1) The list of includes for many .c files will (I suspect) grow
> quite a bit. If it turns out that you'll be replacing one
> #include statement with five or size (per source file), I
> can't really see that making the split was an advantage.
I honestly don't know what the average number of includes that each
#include "symtab.h" would turn into is. Five might be right, or it
might be just a tad high. (It also depends on whether or not the
#include files for minimal_symbol, symbol, and partial_symbol are
allowed to include the one for general_symbol_info.) I've generated
various correlations between uses of different structures (I knew my
number theory background would come in useful somehow!), but they don't
give me a clear answer to this. One example is that minimal_symbol,
symtab, and symbol are the most commonly used structures in symtab.h,
but while 111 files refer to at least one of these structures, only 29
refer to all three. (And it's clear that the conceptual link from
minimal_symbol to symtab passes via symbol: only 4 files refer to
minimal_symbol and symtab but _not_ to symbol, and only 8 files refer
only to symbol but not to either of minimal_symbol or symtab. But
there are tons of files that mention either minimal_symbol or symtab
but not both.)
I'm certainly not looking forward to changing existing files. Having
said that, I don't think it would impose a large future maintenance
burden: if somebody, say, adds a function to an existing file that
calls one of the new headers to have to be pulled in, the compiler
will let that person know, and it's easy enough to use grep to figure
out which file to include.
> 2) One could argue that modifying symtab.h *should* be a heavy weight
> operation. I.e, you're modifying something that's at the very
> heart of gdb and you need to take great care.
This is, to me, a really important issue. But I'm honestly not sure
whether this argues for or against breaking up symtab.h. For example,
since symbol stuff is so important, I like to make small changes and
recompile GDB after each change (and even to run a subset of the test
suite after each change) to help reassure me that I didn't screw
anything up. And having long compilation times really works against
me there: if it takes longer to recompile the program than to make the
change, then I'll wait until I've made several changes before
recompiling.
Or, to give another example, sometimes I make a change, and then after
using it for a little while, I realize that the change is a little
subtler than I first thought. So I want to include a comment that
explains the situation a little better: but, just by including a
comment, I've doomed myself to a large recompilation. (I certainly
wouldn't fix a typo in a comment in symtab.h, even if it's one that
I've made myself, unless I'm doing other changes to symtab.h: it's
just not worth the pain.) Of course, one answer is to do the thinking
before making the change in the first place, and that's the best
situation: but, alas, I'm not a good enough programmer to always be
able to forsee the implications of my changes that way. (Obviously I
should give changes time to settle before submitting them as an RFA,
but that's another matter entirely.)
So, basically, the long compilation times are making it more painful
to follow what seems to me like good software engineering practices.
> 3) Makefile.in maintenance becomes harder due to the larger
> number of header files.
I guess; I don't have a lot of experience with that. I suppose the
problem there is that, if Makefile.in gets out of sync, then you might
not recompile when you're supposed to, and, unlike your first point,
this could lead to problems that programmers were unaware of. That
would be very bad. Too bad there's no way to generate those
dependencies automatically...
> I should note that I don't find any of the above reasons to be
> overly compelling. I just think that we need a better reason for
> making such a split than the build time consideration.
Yeah, I totally agree. I think that there should be some sort of
philosophical justification for when to split a header file, and I
certainly wouldn't argue with you when you say that changes to core
structures in symtab.h shouldn't be made lightly.
David Carlton
carlton@math.stanford.edu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc] split up symtab.h
2002-10-08 13:14 [rfc] split up symtab.h David Carlton
2002-10-08 13:54 ` Kevin Buettner
@ 2002-10-08 15:11 ` Michael Snyder
2002-10-08 15:22 ` David Carlton
2002-10-18 14:12 ` Elena Zannoni
2 siblings, 1 reply; 7+ messages in thread
From: Michael Snyder @ 2002-10-08 15:11 UTC (permalink / raw)
To: David Carlton; +Cc: gdb-patches, Jim Blandy, Elena Zannoni
David Carlton wrote:
>
> I'm sick of having to recompile half of GDB every time I touch
> symtab.h. There's lots of different things in that file; it's
> included in 137 different places (counting only the gdb directory, not
> gdb/mi, etc.), but there's no one thing that it defines that is used
> in more than 71 places, and a lot of things that it defines are used
> in a lot fewer places than that.
>
> Here's what is defined in symtab.h, together with the number of .c
> files that each construct is mentioned in. (I've produced this
> information with a naive use of grep/cut/uniq, so it could be off: the
> main problem is that, say, every use of 'struct blockvector' counts as
> a use of 'struct block'. Then again, what file would use 'struct
> blockvector' without using 'struct block'?)
>
> struct general_symbol_info (1)
Careful. Struct general_symbol_info is mentioned in LOTS of places...
indirectly, thru uses of the macros SYMBOL_NAME, SYMBOL_TYPE, etc.
> struct minimal_symbol (71)
> struct blockvector (13)
> struct block (38)
> struct range_list (2)
> struct alias_list (2)
> struct symbol (62)
> struct partial_symbol (10)
> struct sourcevector (0)
> struct linetable_entry (6)
> struct linetable (9)
> struct source (0)
> struct section_offsets (20)
> struct symtab (65)
> struct partial_symtab (16)
> struct symtab_and_line (39)
> struct symtabs_and_lines (9)
> enum exception_event_kind (4)
> struct exception_event_record (4)
> struct symbol_search (1)
>
> There are, of course, many function declarations in there as well.
>
> Of course, these numbers are still pretty large, but having my
> compilation numbers cut from 137 files to 38 or 62 files or whatever
> would help a lot. (And yes, I realize that 137 isn't accurate: many
> of the files are target-specific, so I'm not really recompiling 137
> files every time I touch symtab.h.)
>
> I haven't generated complete correlation data; some of what I have
> generated is pretty interesting, though. For example, while it's not
> surprising every file that mentions 'struct partial_symbol' also
> mentions 'struct symbol', I was a little surprised to see that 41
> files mention both 'struct symbol' and 'struct minimal_symbol', 21
> files mention only the former, and 30 files mention only the latter.
>
> Here's one possible way to split things up into new files:
>
> gensym.h:
>
> struct general_symbol_info (1)
>
> minsyms.h:
>
> struct minimal_symbol (71)
>
> block.h:
>
> struct blockvector (13)
> struct block (38)
>
> symbol.h:
>
> struct range_list (2)
> struct alias_list (2)
> struct symbol (62)
> namespace_enum (5)
> enum address_class (4)
>
> psymbol.h:
>
> struct partial_symbol (10)
>
> linetable.h:
>
> struct linetable_entry (6)
> struct linetable (9)
>
> symtab.h:
>
> struct symtab (65)
>
> psymtab.h:
>
> struct partial_symtab (16)
>
> sal.h:
>
> struct symtab_and_line (39)
> struct symtabs_and_lines (9)
>
> exception.h:
>
> enum exception_event_kind (4)
> struct exception_event_record (4)
>
> section.h:
>
> struct section_offsets (20)
>
> Move to symtab.c:
>
> struct symbol_search (1)
>
> Delete entirely (where are these used?):
>
> struct sourcevector (0)
> struct source (0)
>
> I haven't looked at where function declarations go; I expect that, in
> practice, it'll be pretty obvious which ones go where. (Though there
> are a few weirdos; where does {set_}main_name go?)
>
> Anyways, if anybody else is similarly annoyed with symtab.h then I'll
> try to split things up and make a more concrete RFA in a bit.
>
> David Carlton
> carlton@math.stanford.edu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc] split up symtab.h
2002-10-08 15:11 ` Michael Snyder
@ 2002-10-08 15:22 ` David Carlton
0 siblings, 0 replies; 7+ messages in thread
From: David Carlton @ 2002-10-08 15:22 UTC (permalink / raw)
To: Michael Snyder; +Cc: gdb-patches, Jim Blandy, Elena Zannoni
On Tue, 08 Oct 2002 15:09:58 -0700, Michael Snyder <msnyder@redhat.com> said:
> David Carlton wrote:
>> struct general_symbol_info (1)
> Careful. Struct general_symbol_info is mentioned in LOTS of
> places... indirectly, thru uses of the macros SYMBOL_NAME,
> SYMBOL_TYPE, etc.
Right, that particular count is totally misleading. Aside from the
macros that you mentioned, the definitions of struct
{minimal_,partial_,}symbol all need to have the definition of struct
general_symbol_info available as well. So there would be nontrivial
dependencies among the header files that I was proposing. (I _think_
the only nontrivial dependencies arise from 'struct
general_symbol_info' and from enums, but I could be wrong.)
Personally, I'd be quite tempted to have the header files for
minimal_symbol, symbol, and partial_symbol all include the header file
for general_symbol_info; I realize that GDB prefers to avoid that, but
here is a situation where the usual substitute, namely opaque
declarations of structures, doesn't work.
Also, the correct location of namespace_enum isn't clear to me; too
bad C doesn't support opaque declarations of enums. And the exact
placements of partial_ stuff isn't clear to me: it seems plausible to
me that 'struct partial_symbol' should either be in the same include
file as 'struct symbol' or in the same file as 'struct
partial_symtab', but which?
David Carlton
carlton@math.stanford.edu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc] split up symtab.h
2002-10-08 13:14 [rfc] split up symtab.h David Carlton
2002-10-08 13:54 ` Kevin Buettner
2002-10-08 15:11 ` Michael Snyder
@ 2002-10-18 14:12 ` Elena Zannoni
2002-10-18 14:52 ` David Carlton
2 siblings, 1 reply; 7+ messages in thread
From: Elena Zannoni @ 2002-10-18 14:12 UTC (permalink / raw)
To: David Carlton; +Cc: gdb-patches, Jim Blandy, Elena Zannoni
David Carlton writes:
> I'm sick of having to recompile half of GDB every time I touch
> symtab.h. There's lots of different things in that file; it's
> included in 137 different places (counting only the gdb directory, not
> gdb/mi, etc.), but there's no one thing that it defines that is used
> in more than 71 places, and a lot of things that it defines are used
> in a lot fewer places than that.
>
I see your point. But I have to agree with Kevin about the tangled web
we could be weaving. Maybe we could start with a little cleanup. Are
all the things exported by symtab.h actually used? And vice versa, are
the files including symtab.h actually needing it? I found a few times
that some dependencies were left in place after some code which needed
them had been deleted, etc.
Elena
> Here's what is defined in symtab.h, together with the number of .c
> files that each construct is mentioned in. (I've produced this
> information with a naive use of grep/cut/uniq, so it could be off: the
> main problem is that, say, every use of 'struct blockvector' counts as
> a use of 'struct block'. Then again, what file would use 'struct
> blockvector' without using 'struct block'?)
>
> struct general_symbol_info (1)
> struct minimal_symbol (71)
> struct blockvector (13)
> struct block (38)
> struct range_list (2)
> struct alias_list (2)
> struct symbol (62)
> struct partial_symbol (10)
> struct sourcevector (0)
> struct linetable_entry (6)
> struct linetable (9)
> struct source (0)
> struct section_offsets (20)
> struct symtab (65)
> struct partial_symtab (16)
> struct symtab_and_line (39)
> struct symtabs_and_lines (9)
> enum exception_event_kind (4)
> struct exception_event_record (4)
> struct symbol_search (1)
>
> There are, of course, many function declarations in there as well.
>
> Of course, these numbers are still pretty large, but having my
> compilation numbers cut from 137 files to 38 or 62 files or whatever
> would help a lot. (And yes, I realize that 137 isn't accurate: many
> of the files are target-specific, so I'm not really recompiling 137
> files every time I touch symtab.h.)
>
> I haven't generated complete correlation data; some of what I have
> generated is pretty interesting, though. For example, while it's not
> surprising every file that mentions 'struct partial_symbol' also
> mentions 'struct symbol', I was a little surprised to see that 41
> files mention both 'struct symbol' and 'struct minimal_symbol', 21
> files mention only the former, and 30 files mention only the latter.
>
> Here's one possible way to split things up into new files:
>
> gensym.h:
>
> struct general_symbol_info (1)
>
> minsyms.h:
>
> struct minimal_symbol (71)
>
> block.h:
>
> struct blockvector (13)
> struct block (38)
>
> symbol.h:
>
> struct range_list (2)
> struct alias_list (2)
> struct symbol (62)
> namespace_enum (5)
> enum address_class (4)
>
> psymbol.h:
>
> struct partial_symbol (10)
>
> linetable.h:
>
> struct linetable_entry (6)
> struct linetable (9)
>
> symtab.h:
>
> struct symtab (65)
>
> psymtab.h:
>
> struct partial_symtab (16)
>
> sal.h:
>
> struct symtab_and_line (39)
> struct symtabs_and_lines (9)
>
> exception.h:
>
> enum exception_event_kind (4)
> struct exception_event_record (4)
>
> section.h:
>
> struct section_offsets (20)
>
> Move to symtab.c:
>
> struct symbol_search (1)
>
> Delete entirely (where are these used?):
>
> struct sourcevector (0)
> struct source (0)
>
> I haven't looked at where function declarations go; I expect that, in
> practice, it'll be pretty obvious which ones go where. (Though there
> are a few weirdos; where does {set_}main_name go?)
>
>
> Anyways, if anybody else is similarly annoyed with symtab.h then I'll
> try to split things up and make a more concrete RFA in a bit.
>
> David Carlton
> carlton@math.stanford.edu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc] split up symtab.h
2002-10-18 14:12 ` Elena Zannoni
@ 2002-10-18 14:52 ` David Carlton
0 siblings, 0 replies; 7+ messages in thread
From: David Carlton @ 2002-10-18 14:52 UTC (permalink / raw)
To: Elena Zannoni; +Cc: gdb-patches, Jim Blandy
On Fri, 18 Oct 2002 17:10:04 -0400, Elena Zannoni <ezannoni@redhat.com> said:
> I see your point. But I have to agree with Kevin about the tangled
> web we could be weaving. Maybe we could start with a little
> cleanup. Are all the things exported by symtab.h actually used? And
> vice versa, are the files including symtab.h actually needing it? I
> found a few times that some dependencies were left in place after
> some code which needed them had been deleted, etc.
I'll look into that. Actually, I already have one candidate for
cleanup: if you do "grep 'struct source' *.h *.c", it sure looks like
struct source and struct sourcevector aren't needed. (And if you add
ChangeLog* to your list of files to grep, it looks even more
suspicious!) So I think that's the obvious candidate for deletion
from symtab.h. Also, I think that struct symbol_search could be moved
to symtab.c. (Though I'd want to look into how that's being used,
because I could imagine that some such structure could be useful for
other files in the future.) And I'll look into the reverse problem,
as you suggest, namely files that might be unnecessarily including
symtab.h.
Also, another thing that I thought of is that it's silly to propose
breaking it up all at once. Right now, for example, I'm fiddling with
'struct block' in carlton_dictionary-branch to see how it would be a
good idea to modify it to handle namespace lookup correctly. (And,
unfortunately, each time I think I understand C++ name lookup rules, I
get surprised by the compiler or the standard, and I realize that my
data structures aren't good enough; it's definitely a good thing that
I'm doing this on a branch...) So maybe, as an experiment, I'll try
moving 'struct block' and 'struct blockvector' into a file "block.h",
and I'll try to be careful about getting the #includes and Makefile.in
just right; if it turns out to be useful, then I'll consider sending
an RFA to the list. Getting just that one bit right would be a lot
easier than splitting symtab.h up into 10 files, or whatever I
proposed in my RFC: splitting it up all at once is just asking for
trouble.
David Carlton
carlton@math.stanford.edu
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2002-10-18 21:52 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-08 13:14 [rfc] split up symtab.h David Carlton
2002-10-08 13:54 ` Kevin Buettner
2002-10-08 15:05 ` David Carlton
2002-10-08 15:11 ` Michael Snyder
2002-10-08 15:22 ` David Carlton
2002-10-18 14:12 ` Elena Zannoni
2002-10-18 14:52 ` David Carlton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox