[PATCH] Make completions almost instantaneous

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* [PATCH] Make completions almost instantaneous
@ 2001-03-30  1:09 Daniel Berlin
  2001-03-30  7:21 ` Eli Zaretskii
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Daniel Berlin @ 2001-03-30  1:09 UTC (permalink / raw)
  To: gdb-patches

Okay, this has pissed me off for years.
You accidently hit tab at the wrong time, and boom, you wait forever for
completion to tell you it found 7000 symbols.
Literally forever (~29 seconds, 16k symbols takes ~1:30. the program i
compliled was one line. Real programs end up with *way* too many symbols.
I let one sit for 10 minutes before giving up once, because i didn't want
to recreate the situation i had gotten to)

This is all due to a "time optimization" performed in completion_list_add_symbol.

Where time optimization is defined as "To save time, increases the order
of magnitude of the algorithm by n, even when unncessary"

What the heck am i talking about?

Well, it clips symbols by running through the entire current completion
list, maybe twice, comparing the string we are going to insert against
them.

That way, we don't insert duplicates.
Of course, this is the wrong way to do it (since you get murdered when
your list starts getting large, seeing as how you'll do n-1->2n-2 string
compares, just to do *one* insertion)  but seeing as how the code was
there, I wasted about 2 hours trying to find a faster way to not insert
duplicates (binary search the list of completions, for starters, and then
do  a sorted insert using binary search again to find the place, and then
memmove to  shift the array up, etc) , and then decided to ignore them
and do it at the end, since it was obvious it was all the strcmps and
moving was almost as expensive in practice, and thus, it would be better
to just do it once in a sort, after we had the entire list, and remove the
duplicates

But here's the kicker. When i went to go just make it do one sort at the
end, and then just not output/remove  the duplicates, i noticed something:
readline already does that (remove_duplicate_matches in complete.c, called
from postprocess_matches if rl_ignore_completion_duplicates is 1, which it
always is for gdb, since we never turn it off)

In other words, all this god damn time is wasted to make sure we don't
insert duplicates, and it won't output duplicates *anyway*, because it
removes duplicate matches (strangely, using almost exactly the same code i
had been ready to add, minus the binary search).

Result:
pushing tab twice at the wrong time used to take 1:30 if it
geenrated
16k completions (I compiled a very minimal c++, broke on main, ran, and
hit "b " tab. I also did the same for "b _" <tab>, which gives me 7k
completions).
It now takes ~3 seconds.

I'm not joking.

There's obviously no difference in the output, since it was removing
duplicates anyway.

No longer do i kill off gdbs because it's faster to re set 15 breakpoints
than wait for completion to tell me i screwed up.

All this patch does is remove code.

Remember kids, premature optimization is the root of all evil.

Since it's unclear to me whether this is an obvious fix, i'll wait for
approval from Elena or Jim.

--Dan

2001-03-30  Daniel Berlin  <dberlin@redhat.com>

	* symtab.c (completion_list_add_name): Remove duplicate string checks,
	readline already does this, and it's much faster at it, too.

Index: symtab.c
===================================================================
RCS file: /cvs/src/src/gdb/symtab.c,v
retrieving revision 1.28
diff -c -3 -p -r1.28 symtab.c
*** symtab.c	2001/01/30 02:49:36	1.28
--- symtab.c	2001/03/30 08:48:30
*************** completion_list_add_name (char *symname,
*** 2836,2852 ****
        return;
      }

-   /* Clip any symbol names that we've already considered.  (This is a
-      time optimization)  */
-
-   for (i = 0; i < return_val_index; ++i)
-     {
-       if (STREQ (symname, return_val[i]))
- 	{
- 	  return;
- 	}
-     }
-
    /* We have a match for a completion, so add SYMNAME to the current list
       of matches. Note that the name is moved to freshly malloc'd space. */

--- 2869,2874 ----
*************** completion_list_add_name (char *symname,
*** 2870,2888 ****
  	strncpy (new, word, sym_text - word);
  	new[sym_text - word] = '\0';
  	strcat (new, symname);
-       }
-
-     /* Recheck for duplicates if we intend to add a modified symbol.  */
-     if (word != sym_text)
-       {
- 	for (i = 0; i < return_val_index; ++i)
- 	  {
- 	    if (STREQ (new, return_val[i]))
- 	      {
- 		xfree (new);
- 		return;
- 	      }
- 	  }
        }

      if (return_val_index + 3 > return_val_size)
--- 2892,2897 ----

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Make completions almost instantaneous
  2001-03-30  1:09 [PATCH] Make completions almost instantaneous Daniel Berlin
@ 2001-03-30  7:21 ` Eli Zaretskii
  2001-03-30  8:42   ` Daniel Berlin
  2001-03-30 13:46 ` Elena Zannoni
  2001-03-30 15:50 ` Andrew Cagney
  2 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2001-03-30  7:21 UTC (permalink / raw)
  To: dberlin; +Cc: gdb-patches

> Date: Fri, 30 Mar 2001 04:09:08 -0500 (EST)
> From: Daniel Berlin <dberlin@redhat.com>
> 
> All this patch does is remove code.
> 
> Remember kids, premature optimization is the root of all evil.

I tried this patch, and all I can say is WOW!  "b TAB" while debugging
Emacs (6.6K symbols) used to take 15 seconds, now it takes 2; the same
while debugging GDB (10K symbols) used to take 30 seconds, now it
takes 3.

Thanks!  I was always annoyed by this, but never had time to dig into
it.

> Since it's unclear to me whether this is an obvious fix, i'll wait for
> approval from Elena or Jim.

Yes, please, please, pretty please, approve this!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Make completions almost instantaneous
  2001-03-30  7:21 ` Eli Zaretskii
@ 2001-03-30  8:42   ` Daniel Berlin
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Berlin @ 2001-03-30  8:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: dberlin, gdb-patches

On Fri, 30 Mar 2001, Eli Zaretskii wrote:

> > Date: Fri, 30 Mar 2001 04:09:08 -0500 (EST)
> > From: Daniel Berlin <dberlin@redhat.com>
> >
> > All this patch does is remove code.
> >
> > Remember kids, premature optimization is the root of all evil.
>
> I tried this patch, and all I can say is WOW!  "b TAB" while debugging
> Emacs (6.6K symbols) used to take 15 seconds, now it takes 2; the same
> while debugging GDB (10K symbols) used to take 30 seconds, now it
> takes 3.

Sounds about right.  It turned completion_list_add_symbol from O(n * m)
(m is the max of the length string we are trying to add, and the string we
are  comparing it to. n is the number of things on the completion list) to
O(1). Completion list addition is called n times.

That really was murder before.
 >
> Thanks!  I was always annoyed by this, but never had time to dig into
> it.
>
I can't even count the number of gdb's i've killed off because i got tired
of waiting for completion to come up.

The best part is that it was commented as a time optimization. It's
obvious the person who added it only completely small lists. It's
absolute murder otherwise.

> > Since it's unclear to me whether this is an obvious fix, i'll wait for
> > approval from Elena or Jim.
>
> Yes, please, please, pretty please, approve this!
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Make completions almost instantaneous
  2001-03-30  1:09 [PATCH] Make completions almost instantaneous Daniel Berlin
  2001-03-30  7:21 ` Eli Zaretskii
@ 2001-03-30 13:46 ` Elena Zannoni
  2001-03-31 11:03   ` Daniel Berlin
  2001-03-30 15:50 ` Andrew Cagney
  2 siblings, 1 reply; 8+ messages in thread
From: Elena Zannoni @ 2001-03-30 13:46 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gdb-patches

Ok, Approved. Please check it in.  Just to be anal, I ran the
testsuite on solaris w/o seeing any regressions.

Nice catch.

Thanks
Elena


Daniel Berlin writes:
 > Okay, this has pissed me off for years.
 > You accidently hit tab at the wrong time, and boom, you wait forever for
 > completion to tell you it found 7000 symbols.
 > Literally forever (~29 seconds, 16k symbols takes ~1:30. the program i
 > compliled was one line. Real programs end up with *way* too many symbols.
 > I let one sit for 10 minutes before giving up once, because i didn't want
 > to recreate the situation i had gotten to)
 > 
 > This is all due to a "time optimization" performed in completion_list_add_symbol.
 > 
 > Where time optimization is defined as "To save time, increases the order
 > of magnitude of the algorithm by n, even when unncessary"
 > 
 > What the heck am i talking about?
 > 
 > Well, it clips symbols by running through the entire current completion
 > list, maybe twice, comparing the string we are going to insert against
 > them.
 > 
 > That way, we don't insert duplicates.
 > Of course, this is the wrong way to do it (since you get murdered when
 > your list starts getting large, seeing as how you'll do n-1->2n-2 string
 > compares, just to do *one* insertion)  but seeing as how the code was
 > there, I wasted about 2 hours trying to find a faster way to not insert
 > duplicates (binary search the list of completions, for starters, and then
 > do  a sorted insert using binary search again to find the place, and then
 > memmove to  shift the array up, etc) , and then decided to ignore them
 > and do it at the end, since it was obvious it was all the strcmps and
 > moving was almost as expensive in practice, and thus, it would be better
 > to just do it once in a sort, after we had the entire list, and remove the
 > duplicates
 > 
 > But here's the kicker. When i went to go just make it do one sort at the
 > end, and then just not output/remove  the duplicates, i noticed something:
 > readline already does that (remove_duplicate_matches in complete.c, called
 > from postprocess_matches if rl_ignore_completion_duplicates is 1, which it
 > always is for gdb, since we never turn it off)
 > 
 > In other words, all this god damn time is wasted to make sure we don't
 > insert duplicates, and it won't output duplicates *anyway*, because it
 > removes duplicate matches (strangely, using almost exactly the same code i
 > had been ready to add, minus the binary search).
 > 
 > Result:
 > pushing tab twice at the wrong time used to take 1:30 if it
 > geenrated
 > 16k completions (I compiled a very minimal c++, broke on main, ran, and
 > hit "b " tab. I also did the same for "b _" <tab>, which gives me 7k
 > completions).
 > It now takes ~3 seconds.
 > 
 > I'm not joking.
 > 
 > There's obviously no difference in the output, since it was removing
 > duplicates anyway.
 > 
 > No longer do i kill off gdbs because it's faster to re set 15 breakpoints
 > than wait for completion to tell me i screwed up.
 > 
 > 
 > All this patch does is remove code.
 > 
 > Remember kids, premature optimization is the root of all evil.
 > 
 > Since it's unclear to me whether this is an obvious fix, i'll wait for
 > approval from Elena or Jim.
 > 
 > --Dan
 > 
 > 2001-03-30  Daniel Berlin  <dberlin@redhat.com>
 > 
 > 	* symtab.c (completion_list_add_name): Remove duplicate string checks,
 > 	readline already does this, and it's much faster at it, too.
 > 
 > Index: symtab.c
 > ===================================================================
 > RCS file: /cvs/src/src/gdb/symtab.c,v
 > retrieving revision 1.28
 > diff -c -3 -p -r1.28 symtab.c
 > *** symtab.c	2001/01/30 02:49:36	1.28
 > --- symtab.c	2001/03/30 08:48:30
 > *************** completion_list_add_name (char *symname,
 > *** 2836,2852 ****
 >         return;
 >       }
 > 
 > -   /* Clip any symbol names that we've already considered.  (This is a
 > -      time optimization)  */
 > -
 > -   for (i = 0; i < return_val_index; ++i)
 > -     {
 > -       if (STREQ (symname, return_val[i]))
 > - 	{
 > - 	  return;
 > - 	}
 > -     }
 > -
 >     /* We have a match for a completion, so add SYMNAME to the current list
 >        of matches. Note that the name is moved to freshly malloc'd space. */
 > 
 > --- 2869,2874 ----
 > *************** completion_list_add_name (char *symname,
 > *** 2870,2888 ****
 >   	strncpy (new, word, sym_text - word);
 >   	new[sym_text - word] = '\0';
 >   	strcat (new, symname);
 > -       }
 > -
 > -     /* Recheck for duplicates if we intend to add a modified symbol.  */
 > -     if (word != sym_text)
 > -       {
 > - 	for (i = 0; i < return_val_index; ++i)
 > - 	  {
 > - 	    if (STREQ (new, return_val[i]))
 > - 	      {
 > - 		xfree (new);
 > - 		return;
 > - 	      }
 > - 	  }
 >         }
 > 
 >       if (return_val_index + 3 > return_val_size)
 > --- 2892,2897 ----
 > 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Make completions almost instantaneous
  2001-03-30 13:46 ` Elena Zannoni
@ 2001-03-31 11:03   ` Daniel Berlin
       [not found]     ` <15047.34162.692296.724782@kwikemart.cygnus.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Berlin @ 2001-03-31 11:03 UTC (permalink / raw)
  To: Elena Zannoni; +Cc: gdb-patches

Elena Zannoni <ezannoni@cygnus.com> writes:

> Ok, Approved. Please check it in.  Just to be anal, I ran the
> testsuite on solaris w/o seeing any regressions.
> 
Elena, would you be so kind as to check it in?
My symtab.c on this branch has gratuitous whitespace changes right
now (from the previous experimentation), in addition to some unrelated
changes i'm about to submit, so I can't just check out a new copy.

> Nice catch.
> 
> Thanks
> Elena
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

[parent not found: <15047.34162.692296.724782@kwikemart.cygnus.com>]

* Re: [PATCH] Make completions almost instantaneous
       [not found]     ` <15047.34162.692296.724782@kwikemart.cygnus.com>
@ 2001-04-01 13:57       ` Daniel Berlin
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Berlin @ 2001-04-01 13:57 UTC (permalink / raw)
  To: Elena Zannoni; +Cc: gdb-patches

Elena Zannoni <ezannoni@cygnus.com> writes:

> Daniel Berlin writes:
>  > Elena Zannoni <ezannoni@cygnus.com> writes:
>  > 
>  > > Ok, Approved. Please check it in.  Just to be anal, I ran the
>  > > testsuite on solaris w/o seeing any regressions.
>  > > 
>  > Elena, would you be so kind as to check it in?
>  > My symtab.c on this branch has gratuitous whitespace changes right
>  > now (from the previous experimentation), in addition to some unrelated
>  > changes i'm about to submit, so I can't just check out a new copy.
>  > 
>  > > Nice catch.
>  > > 
>  > > Thanks
>  > > Elena
>  > > 
>  > > 
>  > 
> 
> What's the status Dan, do you still want me to check this in?

Nope, I did what Kevin suggested, and it worked fine. It's now checked
in.
Thanks though.
--Dan

> 
> Elena

-- 
Winny and I lived in a house that ran on static electricity...
If you wanted to run the blender, you had to rub balloons on
your head.  If you wanted to cook, you had to pull off a sweater
real quick.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Make completions almost instantaneous
  2001-03-30  1:09 [PATCH] Make completions almost instantaneous Daniel Berlin
  2001-03-30  7:21 ` Eli Zaretskii
  2001-03-30 13:46 ` Elena Zannoni
@ 2001-03-30 15:50 ` Andrew Cagney
  2001-03-30 20:55   ` Daniel Berlin
  2 siblings, 1 reply; 8+ messages in thread
From: Andrew Cagney @ 2001-03-30 15:50 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gdb-patches

Daniel Berlin wrote:

> -   /* Clip any symbol names that we've already considered.  (This is a
> -      time optimization)  */
> -
> -   for (i = 0; i < return_val_index; ++i)
> -     {
> -       if (STREQ (symname, return_val[i]))
> -       {
> -         return;
> -       }
> -     }
> -

It probably should have said ``space optimization''.  I'd take a guess
that this dates back to a time when readline didn't remove duplicates
and when people were more worried about memory.

I suspect that someone might eventually re-tweek the code so that it
does an O(log(n)) sorted insertion and that way eliminates duplicates. 
However, given that Eli (i.e. dos) is happy with change, I don't see
that happening soon.

As Elena said, nice catch.

	Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Make completions almost instantaneous
  2001-03-30 15:50 ` Andrew Cagney
@ 2001-03-30 20:55   ` Daniel Berlin
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Berlin @ 2001-03-30 20:55 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: gdb-patches

Andrew Cagney <ac131313@cygnus.com> writes:

> Daniel Berlin wrote:
> 
> > -   /* Clip any symbol names that we've already considered.  (This is a
> > -      time optimization)  */
> > -
> > -   for (i = 0; i < return_val_index; ++i)
> > -     {
> > -       if (STREQ (symname, return_val[i]))
> > -       {
> > -         return;
> > -       }
> > -     }
> > -
> 
> It probably should have said ``space optimization''.  I'd take a guess
> that this dates back to a time when readline didn't remove duplicates
> and when people were more worried about memory.
> 
> I suspect that someone might eventually re-tweek the code so that it
> does an O(log(n)) sorted insertion and that way eliminates
> duplicates. 
I tried this last night, it's not as fast as you would think.
Remember, strcmp is O(n) on the max of the lengths of the strings
being compared, so O (log n) is really O (log mn). It also involves an
O(n) worst case memmove, depending on where in the array the string
goes (if it's the new second string, we have to move the entire rest
of the array by one pointer position), which makes the O(log n)
insertion not buy you anything at all.
Compared to O(1), it's still very slow. You only started getting any
kind of significant difference with >10k completions, and it was still
too slow (1-2 minutes vs 15 minutes vs 3 seconds for the way my patch
makes it).

I was really just trying to tweak this without changing the data structure.
The real thing to do is to change to a binary tree, or a ternary
search tree, do all the insertions (which will be O(log mn), but no
O(n) memmove is necesssary), and delete the tree at the end, filling
in the array pointers in the right order as you delete the tree.

This is much more work, and it just wasn't worth it.  You'd get a lot
more memory savings by bcaching the symbol names it's putting in the
list (which would make the duplicate strings just duplicate pointers
anyway, and only wasting one pointer a piece, rather than the length
of the string), then you would trying to fix the code here.

IMHO, of course.

> However, given that Eli (i.e. dos) is happy with change, I don't see
> that happening soon.

As I said, the best way to remove duplicates here without touching the
existing char ** structure is to bcache the symbol names in all the
readers, thus making almost all of the duplicates in the completion
list (there is no way for a duplicate to be anything but of a symbol,
since those are the only things we can find in symtabs, psymtabs, and
minsyms), waste a lot less memory.

You'd also get a general memory usage decrease too from the change.

> 
> As Elena said, nice catch.
> 
> 	Andrew

-- 
I looked out my apartment window, and I saw a bird wearing
sneakers and a button saying, "I ain't flying no where."  I
said, "What's your problem buddy?"  He said, "I'm sick of this
stuff -- winter here, summer there, winter here, summer there.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2001-04-01 13:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-03-30  1:09 [PATCH] Make completions almost instantaneous Daniel Berlin
2001-03-30  7:21 ` Eli Zaretskii
2001-03-30  8:42   ` Daniel Berlin
2001-03-30 13:46 ` Elena Zannoni
2001-03-31 11:03   ` Daniel Berlin
     [not found]     ` <15047.34162.692296.724782@kwikemart.cygnus.com>
2001-04-01 13:57       ` Daniel Berlin
2001-03-30 15:50 ` Andrew Cagney
2001-03-30 20:55   ` Daniel Berlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox