From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4557 invoked by alias); 19 Nov 2003 21:10:08 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 4543 invoked from network); 19 Nov 2003 21:10:04 -0000 Received: from unknown (HELO mx1.redhat.com) (66.187.233.31) by sources.redhat.com with SMTP; 19 Nov 2003 21:10:04 -0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.11.6/8.11.6) with ESMTP id hAJLA1H19895 for ; Wed, 19 Nov 2003 16:10:01 -0500 Received: from redhat.com (wcohen.devel.redhat.com [172.16.56.117]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id hAJLA0w06829; Wed, 19 Nov 2003 16:10:00 -0500 Message-ID: <3FBBDC27.50204@redhat.com> Date: Wed, 19 Nov 2003 21:10:00 -0000 From: Will Cohen User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030922 X-Accept-Language: en-us, en MIME-Version: 1.0 To: gdb-patches@sources.redhat.com, wcohen@redhat.com Subject: Slow handling of C++ symbol names Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2003-11/txt/msg00407.txt.bz2 When debugging some C++ program with gdb (both versions 5.3 and 6.0) it takes a long time to load the debugging information and get a command-line prompt. The particular example that demonstrates this problem is monotone (http://www.venge.net/monotone/) which makes use of the boost libraries (http://www.boost.org/). gdb loading monotone with the associated debugging information takes over 30 seconds to get the initial command-line prompt on a 1.7GHz Athlon with 512MB or DRAM. monotone is not a small program, over 3MB of debugging information. I tooks some measurements with OProfile to find out where gdb 5.3 was spending its time. 3/4 of the time is spent in dyn_string_substring and __GI_strncpy, the column below the '%'. I recompiled gdb with '-pg' to find that this is due to the C++ name demangling. The highlights of gprof output are shown below. I looked at the symbols for monotone/boost in the executable, some of them are VERY long and for monotone/boost they average 147 characters in length. An example of ONE symbol over 2KB is at the end of this mail (there are others like that). I suspect that some of the algorithms used in gdb may not expect such long symbol names and be part of the reason for the slow startup, e.g. work O(length_of_string ^2). Has anyone else noticed this type of problem before? [wcohen@slingshot monotone-0.7]$ oprofpp -l -r -k /usr/bin/gdb|more Cpu type: Athlon Cpu speed was (MHz estimation) : 1687.55 Counter 0 counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 843500 vma samples % symbol name image name 081e20c0 36781 55.9424 dyn_string_substring /usr/lib/debug/usr/bin/gdb.debug 00077290 14743 22.4235 __GI_strncpy /lib/tls/libc-2.3.2.so 080f5e40 3434 5.22297 msymbol_hash_iw /usr/lib/debug/usr/bin/gdb.debug 00021dc0 1748 2.65864 __ctype_b_loc /lib/tls/libc-2.3.2.so 00076990 1603 2.4381 __GI_strcpy /lib/tls/libc-2.3.2.so 081e1e10 1417 2.1552 dyn_string_insert /usr/lib/debug/usr/bin/gdb.debug The profile information from gprof. Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 88.41 18.46 18.46 354132 0.00 0.00 dyn_string_substring 5.84 19.68 1.22 19299 0.00 0.00 msymbol_hash_iw ... 0.00 0.00 220617 demangle_class_enum_type [423] 0.00 0.00 95 demangle_function_type [482] ----------------------------------------------- 0.00 0.74 14108/354132 demangle_name [29] 0.00 1.04 19908/354132 demangle_type_ptr [28] 0.00 4.55 87228/354132 demangle_prefix [24] 0.01 12.16 232888/354132 demangle_type [23] [21] 88.6 0.01 18.49 354132 substitution_add [21] 18.46 0.01 354132/354132 dyn_string_substring [22] 0.00 0.01 354132/979183 dyn_string_new [40] ----------------------------------------------- 18.46 0.01 354132/354132 substitution_add [21] [22] 88.5 18.46 0.01 354132 dyn_string_substring [22] 0.01 0.01 354132/4003799 dyn_string_resize [35] ----------------------------------------------- 13904 demangle_type [23] 55 demangle_operator_name Find out how many symbols monotone has $ nm monotone|wc 23623 70853 3657239 Some of symbol names from the boost library are really, really, really long. Here is an example that is over 2K in length: 08145572 W _ZN5boost6spirit12subrule_listINS0_14subrule_parserILi0ENS0_8sequenceINS3_INS3_INS3_INS0_6actionINS0_7subruleILi1ENS0_15closure_contextI17rcs_admin_closureEEEEN7phoenix5actorINSA_9compositeINSA_9assign_opENSB_INSC_INSA_21member_var_ptr_actionI9rcs_admin8rcs_fileEENSB_INSA_14closure_memberILi0ENSA_7closureISG_NSA_5nil_tESK_EEEEEESK_SK_SK_EEEENSB_INSA_8argumentILi0EEEEESK_SK_EEEEEENS0_11alternativeINS0_11kleene_starINS4_INS5_ILi3ENS6_I17rcs_delta_closureEEEENSB_INSC_INSA_26member_function_ptr_actionIvSG_RK9rcs_deltaSK_SK_SK_EESN_SS_SK_SK_EEEEEEEENS3_INSX_INS5_ILi12ENS0_14parser_contextEEEEES19_EEEEEENSW_INS5_ILi2ENS6_I16rcs_desc_closureEEEENS3_IS1C_S1I_EEEEEENSX_INS4_INS5_ILi4ENS6_I21rcs_deltatext_closureEEEENSB_INSC_INS11_IvSG_RK13rcs_deltatextSK_SK_SK_EESN_SS_SK_SK_EEEEEEEEEENS0_10contiguousINSX_INS0_5chlitIcEEEEEEEENS6_I16rcs_file_closureEEEENS1_INS2_ILi10ES20_S1A_EENS1_INS2_ILi14ENS0_8positiveINSW_INS0_12digit_parserES20_EEEES1A_EENS1_INS2_ILi13ENS0_10differenceINS0_12graph _parserENS0_5chsetIcEEEES1A_EENS1_INS2_ILi12ENS3_INS3_INS5_ILi8ENS6_I14string_closureEEEENSX_INS5_ILi9ES2K_EEEEEENS5_ILi10ES1A_EEEES1A_EENS1_INS2_ILi9ENSW_INSW_INSW_IS2L_NS5_ILi7ES2K_EEEENS5_ILi5ES2K_EEEES2P_EES2K_EENS1_INS2_ILi4ENS3_INS3_INS3_INS3_INS4_IS2S_NSB_INSC_ISD_NSB_INSC_INSE_ISsS1P_EENSB_INSI_ILi0ENSJ_IS1P_SK_SK_EEEEEESK_SK_SK_EEEESS_SK_SK_EEEEEENS0_6strlitIPKcEEEENS4_IS2U_S35_EEEENSW_IS3A_NS3_IS1C_S3A_EEEEEES3C_EES1N_EENS1_INS2_ILi2ENS3_IS3A_NS4_IS2U_NSB_INSC_ISD_NSB_INSI_ILi0ENSJ_ISsSK_SK_EEEEEESS_SK_SK_EEEEEEEES1H_EENS1_INS2_ILi3ENS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS4_IS2S_NSB_INSC_ISD_NSB_INSC_INSE_ISsS12_EENSB_INSI_ILi0ENSJ_IS12_SK_SK_EEEEEESK_SK_SK_EEEESS_SK_SK_EEEEEES3A_EES3Z_EES2P_EES3A_EENS4_IS2L_S3Y_EEEES2P_EES3A_EENS0_8optionalIS44_EEEES2P_EES3A_EENSX_INS4_IS2S_NSB_INSC_INS11_IvS12_RKSsSK_SK_SK_EES3U_SS_SK_SK_EEEEEEEEEES2P_EES3A_EENS48_IS3Z_EEEES2P_EESZ_EENS1_INS2_ILi1ENS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS 3_INS3_INS3_INS3_INS3_INS3_IS3A_NS4_IS2S_NSB_INSC_ISD_NSB_INSC_INSE_ISsSF_EENSB_INSI_ILi0ENSJ_ISF_SK_SK_EEEEEESK_SK_SK_EEEESS_SK_SK_EEEEEEEES2P_EENS48_IS51_EEEES3A_EENSX_INS4_IS2L_S4Y_EEEEEES2P_EES3A_EENSX_INS3_INS3_INS5_ILi6ES2K_EENS5_ILi11ES1A_EEEES2S_EEEEEES2P_EES3A_EENSX_INS3_INS3_IS2L_S5B_EES2S_EEEEEES2P_EENS48_INS3_IS3A_S2P_EEEEEENS48_INS3_INS3_IS3A_NS4_IS2U_S4Y_EEEES2P_EEEEEES5T_EES8_EENS0_5nil_tEEEEEEEEEEEEEEEEEEEEC1ERKS26_RKS66_ Doing a little computation on the wc of the nm output to figure out the average length of each symbol. Account for the 8 character address and 1 character symbol type information: (3657239 char - (8 char address + 1 char type) * 23623)/23623= 146 char -Will