* Slow handling of C++ symbol names
@ 2003-11-19 21:10 Will Cohen
2003-11-19 22:10 ` Daniel Berlin
2003-11-20 2:05 ` Michael Snyder
0 siblings, 2 replies; 5+ messages in thread
From: Will Cohen @ 2003-11-19 21:10 UTC (permalink / raw)
To: gdb-patches, wcohen
When debugging some C++ program with gdb (both versions 5.3 and 6.0) it
takes a long time to load the debugging information and get a
command-line prompt. The particular example that demonstrates this
problem is monotone (http://www.venge.net/monotone/) which makes use of
the boost libraries (http://www.boost.org/). gdb loading monotone with
the associated debugging information takes over 30 seconds to get the
initial command-line prompt on a 1.7GHz Athlon with 512MB or DRAM.
monotone is not a small program, over 3MB of debugging information. I
tooks some measurements with OProfile to find out where gdb 5.3 was
spending its time. 3/4 of the time is spent in dyn_string_substring and
__GI_strncpy, the column below the '%'. I recompiled gdb with '-pg' to
find that this is due to the C++ name demangling. The highlights of
gprof output are shown below. I looked at the symbols for monotone/boost
in the executable, some of them are VERY long and for monotone/boost
they average 147 characters in length. An example of ONE symbol over 2KB
is at the end of this mail (there are others like that). I suspect that
some of the algorithms used in gdb may not expect such long symbol names
and be part of the reason for the slow startup, e.g. work
O(length_of_string ^2). Has anyone else noticed this type of problem before?
[wcohen@slingshot monotone-0.7]$ oprofpp -l -r -k /usr/bin/gdb|more
Cpu type: Athlon
Cpu speed was (MHz estimation) : 1687.55
Counter 0 counted CPU_CLK_UNHALTED events (Cycles outside of halt state)
with a
unit mask of 0x00 (No unit mask) count 843500
vma samples % symbol name image name
081e20c0 36781 55.9424 dyn_string_substring
/usr/lib/debug/usr/bin/gdb.debug
00077290 14743 22.4235 __GI_strncpy /lib/tls/libc-2.3.2.so
080f5e40 3434 5.22297 msymbol_hash_iw
/usr/lib/debug/usr/bin/gdb.debug
00021dc0 1748 2.65864 __ctype_b_loc /lib/tls/libc-2.3.2.so
00076990 1603 2.4381 __GI_strcpy /lib/tls/libc-2.3.2.so
081e1e10 1417 2.1552 dyn_string_insert
/usr/lib/debug/usr/bin/gdb.debug
The profile information from gprof.
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
88.41 18.46 18.46 354132 0.00 0.00 dyn_string_substring
5.84 19.68 1.22 19299 0.00 0.00 msymbol_hash_iw
...
0.00 0.00 220617 demangle_class_enum_type <cycle 2>
[423]
0.00 0.00 95 demangle_function_type
<cycle 2> [482]
-----------------------------------------------
0.00 0.74 14108/354132 demangle_name <cycle
2> [29]
0.00 1.04 19908/354132 demangle_type_ptr
<cycle 2> [28]
0.00 4.55 87228/354132 demangle_prefix <cycle
2> [24]
0.01 12.16 232888/354132 demangle_type <cycle
2> [23]
[21] 88.6 0.01 18.49 354132 substitution_add [21]
18.46 0.01 354132/354132 dyn_string_substring [22]
0.00 0.01 354132/979183 dyn_string_new [40]
-----------------------------------------------
18.46 0.01 354132/354132 substitution_add [21]
[22] 88.5 18.46 0.01 354132 dyn_string_substring [22]
0.01 0.01 354132/4003799 dyn_string_resize [35]
-----------------------------------------------
13904 demangle_type <cycle
2> [23]
55 demangle_operator_name
<cycle 2>
Find out how many symbols monotone has
$ nm monotone|wc
23623 70853 3657239
Some of symbol names from the boost library are really, really, really
long. Here is an example that is over 2K in length:
08145572 W
_ZN5boost6spirit12subrule_listINS0_14subrule_parserILi0ENS0_8sequenceINS3_INS3_INS3_INS0_6actionINS0_7subruleILi1ENS0_15closure_contextI17rcs_admin_closureEEEEN7phoenix5actorINSA_9compositeINSA_9assign_opENSB_INSC_INSA_21member_var_ptr_actionI9rcs_admin8rcs_fileEENSB_INSA_14closure_memberILi0ENSA_7closureISG_NSA_5nil_tESK_EEEEEESK_SK_SK_EEEENSB_INSA_8argumentILi0EEEEESK_SK_EEEEEENS0_11alternativeINS0_11kleene_starINS4_INS5_ILi3ENS6_I17rcs_delta_closureEEEENSB_INSC_INSA_26member_function_ptr_actionIvSG_RK9rcs_deltaSK_SK_SK_EESN_SS_SK_SK_EEEEEEEENS3_INSX_INS5_ILi12ENS0_14parser_contextEEEEES19_EEEEEENSW_INS5_ILi2ENS6_I16rcs_desc_closureEEEENS3_IS1C_S1I_EEEEEENSX_INS4_INS5_ILi4ENS6_I21rcs_deltatext_closureEEEENSB_INSC_INS11_IvSG_RK13rcs_deltatextSK_SK_SK_EESN_SS_SK_SK_EEEEEEEEEENS0_10contiguousINSX_INS0_5chlitIcEEEEEEEENS6_I16rcs_file_closureEEEENS1_INS2_ILi10ES20_S1A_EENS1_INS2_ILi14ENS0_8positiveINSW_INS0_12digit_parserES20_EEEES1A_EENS1_INS2_ILi13ENS0_10differenceINS0_12graph
_parserENS0_5chsetIcEEEES1A_EENS1_INS2_ILi12ENS3_INS3_INS5_ILi8ENS6_I14string_closureEEEENSX_INS5_ILi9ES2K_EEEEEENS5_ILi10ES1A_EEEES1A_EENS1_INS2_ILi9ENSW_INSW_INSW_IS2L_NS5_ILi7ES2K_EEEENS5_ILi5ES2K_EEEES2P_EES2K_EENS1_INS2_ILi4ENS3_INS3_INS3_INS3_INS4_IS2S_NSB_INSC_ISD_NSB_INSC_INSE_ISsS1P_EENSB_INSI_ILi0ENSJ_IS1P_SK_SK_EEEEEESK_SK_SK_EEEESS_SK_SK_EEEEEENS0_6strlitIPKcEEEENS4_IS2U_S35_EEEENSW_IS3A_NS3_IS1C_S3A_EEEEEES3C_EES1N_EENS1_INS2_ILi2ENS3_IS3A_NS4_IS2U_NSB_INSC_ISD_NSB_INSI_ILi0ENSJ_ISsSK_SK_EEEEEESS_SK_SK_EEEEEEEES1H_EENS1_INS2_ILi3ENS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS4_IS2S_NSB_INSC_ISD_NSB_INSC_INSE_ISsS12_EENSB_INSI_ILi0ENSJ_IS12_SK_SK_EEEEEESK_SK_SK_EEEESS_SK_SK_EEEEEES3A_EES3Z_EES2P_EES3A_EENS4_IS2L_S3Y_EEEES2P_EES3A_EENS0_8optionalIS44_EEEES2P_EES3A_EENSX_INS4_IS2S_NSB_INSC_INS11_IvS12_RKSsSK_SK_SK_EES3U_SS_SK_SK_EEEEEEEEEES2P_EES3A_EENS48_IS3Z_EEEES2P_EESZ_EENS1_INS2_ILi1ENS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS
3_INS3_INS3_INS3_INS3_INS3_IS3A_NS4_IS2S_NSB_INSC_ISD_NSB_INSC_INSE_ISsSF_EENSB_INSI_ILi0ENSJ_ISF_SK_SK_EEEEEESK_SK_SK_EEEESS_SK_SK_EEEEEEEES2P_EENS48_IS51_EEEES3A_EENSX_INS4_IS2L_S4Y_EEEEEES2P_EES3A_EENSX_INS3_INS3_INS5_ILi6ES2K_EENS5_ILi11ES1A_EEEES2S_EEEEEES2P_EES3A_EENSX_INS3_INS3_IS2L_S5B_EES2S_EEEEEES2P_EENS48_INS3_IS3A_S2P_EEEEEENS48_INS3_INS3_IS3A_NS4_IS2U_S4Y_EEEES2P_EEEEEES5T_EES8_EENS0_5nil_tEEEEEEEEEEEEEEEEEEEEC1ERKS26_RKS66_
Doing a little computation on the wc of the nm output to figure out the
average length of each symbol. Account for the 8 character address and 1
character symbol type information:
(3657239 char - (8 char address + 1 char type) * 23623)/23623= 146 char
-Will
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Slow handling of C++ symbol names
2003-11-19 21:10 Slow handling of C++ symbol names Will Cohen
@ 2003-11-19 22:10 ` Daniel Berlin
2003-11-20 2:05 ` Michael Snyder
1 sibling, 0 replies; 5+ messages in thread
From: Daniel Berlin @ 2003-11-19 22:10 UTC (permalink / raw)
To: Will Cohen; +Cc: gdb-patches
>
> Find out how many symbols monotone has
> $ nm monotone|wc
> 23623 70853 3657239
>
>
>
> Some of symbol names from the boost library are really, really, really
> long. Here is an example that is over 2K in length:
>
That's not too bad, actually.
Consider that your 2k symbol expands to over 80k when demangled!
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Slow handling of C++ symbol names
2003-11-19 21:10 Slow handling of C++ symbol names Will Cohen
2003-11-19 22:10 ` Daniel Berlin
@ 2003-11-20 2:05 ` Michael Snyder
2003-11-20 14:27 ` Will Cohen
1 sibling, 1 reply; 5+ messages in thread
From: Michael Snyder @ 2003-11-20 2:05 UTC (permalink / raw)
To: Will Cohen; +Cc: gdb-patches
Will Cohen wrote:
> When debugging some C++ program with gdb (both versions 5.3 and 6.0) it
> takes a long time to load the debugging information and get a
> command-line prompt. The particular example that demonstrates this
> problem is monotone (http://www.venge.net/monotone/) which makes use of
> the boost libraries (http://www.boost.org/). gdb loading monotone with
> the associated debugging information takes over 30 seconds to get the
> initial command-line prompt on a 1.7GHz Athlon with 512MB or DRAM.
30 seconds! Why, you young whipper-snapper... I remember when you
used to have to go get a cup of coffee while gdb loaded symbols! ;-)
Hell, sometimes you could go out and *buy* the coffee...
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Slow handling of C++ symbol names
2003-11-20 2:05 ` Michael Snyder
@ 2003-11-20 14:27 ` Will Cohen
2003-11-20 20:22 ` Michael Snyder
0 siblings, 1 reply; 5+ messages in thread
From: Will Cohen @ 2003-11-20 14:27 UTC (permalink / raw)
To: Michael Snyder; +Cc: gdb-patches
Michael Snyder wrote:
> Will Cohen wrote:
>
>> When debugging some C++ program with gdb (both versions 5.3 and 6.0)
>> it takes a long time to load the debugging information and get a
>> command-line prompt. The particular example that demonstrates this
>> problem is monotone (http://www.venge.net/monotone/) which makes use
>> of the boost libraries (http://www.boost.org/). gdb loading monotone
>> with the associated debugging information takes over 30 seconds to get
>> the initial command-line prompt on a 1.7GHz Athlon with 512MB or DRAM.
>
>
> 30 seconds! Why, you young whipper-snapper... I remember when you
> used to have to go get a cup of coffee while gdb loaded symbols! ;-)
> Hell, sometimes you could go out and *buy* the coffee...
In the early 90's I did a port of GCC to the Gould NP1, a dual processor
machine (18MHz ECL multiboard processors) with a staggering 256MB of
RAM at Purdue University. Unfortunately, the machine was the main
machine for ALL the undergraduates. During the afternoons it was typical
to see gcc 1.40 take 3 hours to build when there were 70 or more people
on the machine.
The process for gdb debugging monotone wouldn't fit in the NP1's 256MB
of memory. According to top SIZE is 265MB and RSS is 262MB. So maybe 30
seconds for filling up a quarter gigabyte of memory isn't so bad.
-Will
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Slow handling of C++ symbol names
2003-11-20 14:27 ` Will Cohen
@ 2003-11-20 20:22 ` Michael Snyder
0 siblings, 0 replies; 5+ messages in thread
From: Michael Snyder @ 2003-11-20 20:22 UTC (permalink / raw)
To: Will Cohen; +Cc: gdb-patches
Will Cohen wrote:
> Michael Snyder wrote:
>
>> Will Cohen wrote:
>>
>>> When debugging some C++ program with gdb (both versions 5.3 and 6.0)
>>> it takes a long time to load the debugging information and get a
>>> command-line prompt. The particular example that demonstrates this
>>> problem is monotone (http://www.venge.net/monotone/) which makes use
>>> of the boost libraries (http://www.boost.org/). gdb loading monotone
>>> with the associated debugging information takes over 30 seconds to
>>> get the initial command-line prompt on a 1.7GHz Athlon with 512MB or
>>> DRAM.
>>
>>
>>
>> 30 seconds! Why, you young whipper-snapper... I remember when you
>> used to have to go get a cup of coffee while gdb loaded symbols! ;-)
>> Hell, sometimes you could go out and *buy* the coffee...
>
>
> In the early 90's I did a port of GCC to the Gould NP1, a dual processor
> machine (18MHz ECL multiboard processors) with a staggering 256MB of
> RAM at Purdue University. Unfortunately, the machine was the main
> machine for ALL the undergraduates. During the afternoons it was typical
> to see gcc 1.40 take 3 hours to build when there were 70 or more people
> on the machine.
>
> The process for gdb debugging monotone wouldn't fit in the NP1's 256MB
> of memory. According to top SIZE is 265MB and RSS is 262MB. So maybe 30
> seconds for filling up a quarter gigabyte of memory isn't so bad.
It would be bad if it were just copying from disk to memory,
but it's not bad for parsing one database of that size into another
with a completely different format.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2003-11-20 20:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-19 21:10 Slow handling of C++ symbol names Will Cohen
2003-11-19 22:10 ` Daniel Berlin
2003-11-20 2:05 ` Michael Snyder
2003-11-20 14:27 ` Will Cohen
2003-11-20 20:22 ` Michael Snyder
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox