From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8188 invoked by alias); 14 Jan 2009 17:46:23 -0000 Received: (qmail 8180 invoked by uid 22791); 14 Jan 2009 17:46:22 -0000 X-SWARE-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from mel.act-europe.fr (HELO mel.act-europe.fr) (212.99.106.210) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 14 Jan 2009 17:45:46 +0000 Received: from localhost (localhost [127.0.0.1]) by filtered-smtp.eu.adacore.com (Postfix) with ESMTP id D7E2D290028 for ; Wed, 14 Jan 2009 18:45:43 +0100 (CET) Received: from mel.act-europe.fr ([127.0.0.1]) by localhost (smtp.eu.adacore.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YphynegDFcRo for ; Wed, 14 Jan 2009 18:45:43 +0100 (CET) Received: from province.act-europe.fr (province.act-europe.fr [10.10.0.214]) by mel.act-europe.fr (Postfix) with ESMTP id F0CE9290003 for ; Wed, 14 Jan 2009 18:45:42 +0100 (CET) Received: by province.act-europe.fr (Postfix, from userid 560) id DC803165C17; Wed, 14 Jan 2009 18:45:42 +0100 (CET) Date: Wed, 14 Jan 2009 17:46:00 -0000 From: Jerome Guitton To: gdb-patches@sources.redhat.com Subject: [Bug symtab/8367] [RFA] performance improvement of lookup_partial_symtab Message-ID: <20090114174542.GM84382@adacore.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="6zdv2QT/q3FMhpsV" Content-Disposition: inline User-Agent: Mutt/1.5.17 (2007-11-01) Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2009-01/txt/msg00325.txt.bz2 --6zdv2QT/q3FMhpsV Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-length: 1081 The move to bugzilla has "resurrected" this patch; I suggested it several years ago, but it's still relevant. The problem is explained in details in the comment (very useful when I had to double-check that the patch was not obsolete, actually). To summarize it: lookup_partial_symtab is particularly slow when it has been given an absolute path. The reason is that it needs to build the full filename for every psymtab; when building this full filename, the corresponding file is opened/closed by find_and_open_source. As a consequence, in the worst case, the loop ends up opening/closing every source file of the application. Pretty bad for a big application, in particular if the sources are located on a slow file system. The idea of the patch is to avoid building the psymtab full filename if the basenames are different. Tested on linux, no regression. OK to apply? 2009-01-14 Jerome Guitton * symtab.c (lookup_partial_symtab): When looking up an absolute path in the partial symtabs, compare the base names before checking the full names. --6zdv2QT/q3FMhpsV Content-Type: video/dv Content-Disposition: attachment; filename="lookup_partial_symtab.dif" Content-Transfer-Encoding: quoted-printable Content-length: 2838 Index: symtab.c=0A= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A= RCS file: /cvs/src/src/gdb/symtab.c,v=0A= retrieving revision 1.200=0A= diff -u -p -r1.200 symtab.c=0A= --- symtab.c 3 Jan 2009 05:57:53 -0000 1.200=0A= +++ symtab.c 14 Jan 2009 17:10:27 -0000=0A= @@ -281,8 +281,48 @@ lookup_partial_symtab (const char *name)=0A= }=0A= =20=0A= /* If the user gave us an absolute path, try to find the file in=0A= - this symtab and use its absolute path. */=0A= - if (full_path !=3D NULL)=0A= + this symtab and use its absolute path.=20=0A= +=0A= + psymtab_to_fullname has a significant cost as it calls=0A= + find_and_open_source, which itself does some I/O operation=0A= + (e.g. open). In cumulative, it can take several seconds with=0A= + large systems (around 4000 files), if the file is accessed=0A= + through a slow file system (e.g. NFS). Here is a shell script=0A= + that you can use to generate such a large system:=0A= +=0A= + echo "void main () {}" > t.c=0A= + gcc -c -g t.c=0A= + previous=3D""=0A= + for i in 0 1 2 3 ; do=0A= + for j in 0 1 2 3 4 5 6 7 8 9 ; do=0A= + for k in 0 1 2 3 4 5 6 7 8 9 ; do=0A= + for l in 0 1 2 3 4 5 6 7 8 9 ; do=0A= + name=3D"${i}${j}${k}${l}"=0A= + echo "void f_$name () {}" >> f_$name.c=0A= + gcc -c -g f_$name.c=0A= + ld -r f_$name.o $previous -o main_$name=0A= + rm f_*.o=0A= + rm $previous=0A= + previous=3Dmain_$name=0A= + done=0A= + done=0A= + done=0A= + done=0A= + gcc $previous t.o -o main=0A= +=0A= + Using the following GDB commands should demonstrate the problem:=0A= + list /f_0000.c:1=0A= + list /f_3999.c:1=0A= +=0A= + To reduce the cost, the full comparison is done if and only if=0A= + the base names are not different. This would have a low cost,=0A= + as it only does string manipulations. This optimisation has no=0A= + impact on relatives path (e.g. the more common 'list=0A= + f_0000.c:1'), as in this case full_path =3D=3D NULL. */=0A= +=0A= + if (full_path !=3D NULL=0A= + && FILENAME_CMP (lbasename (full_path),=0A= + lbasename (pst->filename)) =3D=3D 0)=0A= {=0A= psymtab_to_fullname (pst);=0A= if (pst->fullname !=3D NULL=0A= @@ -292,7 +332,9 @@ lookup_partial_symtab (const char *name)=0A= }=0A= }=0A= =20=0A= - if (real_path !=3D NULL)=0A= + if (real_path !=3D NULL=0A= + && FILENAME_CMP (lbasename (full_path),=0A= + lbasename (pst->filename)) =3D=3D 0)=0A= {=0A= char *rp =3D NULL;=0A= psymtab_to_fullname (pst);=0A= --6zdv2QT/q3FMhpsV--