text file formats

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

* text file formats
@ 2006-04-05 22:31 Bob Rossi
  2006-04-05 23:39 ` Daniel Jacobowitz
  2006-04-06  3:43 ` Eli Zaretskii
  0 siblings, 2 replies; 23+ messages in thread
From: Bob Rossi @ 2006-04-05 22:31 UTC (permalink / raw)
  To: gdb

Hi,

While trying to display to the user a source file, it has become
increasingly obvious to me how complicated such a simple task can be.

Unix formatted text files have "\n" for a newline, dos formatted text files 
have "\r\n" for a newline and mac formatted text files have "\r" for a
newline. In the 3 case's above it is obvious how to determine exactly
which line is which.

However, it is easy to mix these file formats. In this case, any particular 
file can use any combination of "\r", "\r\n" and "\n" for newlines. I'm not 
even sure how to display such a file. I'm guessing that's it's
ambiguous, and i can make a best guess as to what the newline sequence
should be. Is this correct?

One thing I have determined, is that in order to know what the file
format is, the entire text file needs to be parsed. After that, either
the file format is defined (unix/dos/mac) or it is undefined (mix of
them).

I would like to make sure that the algorithm CGDB uses to determine
the line number from a file is the same algorithm that GDB uses. Can
anyone point me in the correct direction?

Thanks,
Bob Rossi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-05 22:31 text file formats Bob Rossi
@ 2006-04-05 23:39 ` Daniel Jacobowitz
  2006-04-06  0:14   ` Bob Rossi
  2006-04-06  3:47   ` Eli Zaretskii
  2006-04-06  3:43 ` Eli Zaretskii
  1 sibling, 2 replies; 23+ messages in thread
From: Daniel Jacobowitz @ 2006-04-05 23:39 UTC (permalink / raw)
  To: gdb

On Wed, Apr 05, 2006 at 06:31:22PM -0400, Bob Rossi wrote:
> One thing I have determined, is that in order to know what the file
> format is, the entire text file needs to be parsed. After that, either
> the file format is defined (unix/dos/mac) or it is undefined (mix of
> them).
> 
> I would like to make sure that the algorithm CGDB uses to determine
> the line number from a file is the same algorithm that GDB uses. Can
> anyone point me in the correct direction?

GDB does something much simpler.  It opens the file in text mode and
lets the C library sort it out.

Well, usually.  In search and reverse search it sometimes uses a
similar but slightly simpler algorithm: ignore '\r' if followed by
'\n'.  I'm not sure why those are done in binary mode.

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-05 23:39 ` Daniel Jacobowitz
@ 2006-04-06  0:14   ` Bob Rossi
  2006-04-06  1:17     ` Daniel Jacobowitz
  2006-04-06  3:47   ` Eli Zaretskii
  1 sibling, 1 reply; 23+ messages in thread
From: Bob Rossi @ 2006-04-06  0:14 UTC (permalink / raw)
  To: gdb

On Wed, Apr 05, 2006 at 07:39:38PM -0400, Daniel Jacobowitz wrote:
> On Wed, Apr 05, 2006 at 06:31:22PM -0400, Bob Rossi wrote:
> > One thing I have determined, is that in order to know what the file
> > format is, the entire text file needs to be parsed. After that, either
> > the file format is defined (unix/dos/mac) or it is undefined (mix of
> > them).
> > 
> > I would like to make sure that the algorithm CGDB uses to determine
> > the line number from a file is the same algorithm that GDB uses. Can
> > anyone point me in the correct direction?
> 
> GDB does something much simpler.  It opens the file in text mode and
> lets the C library sort it out.
> 
> Well, usually.  In search and reverse search it sometimes uses a
> similar but slightly simpler algorithm: ignore '\r' if followed by
> '\n'.  I'm not sure why those are done in binary mode.

OK, so now I'm confused. If the user looks at the text file through my
viewer, and set's a breakpoint at line 100, how can I be sure it's the
same 100 that GDB will actually set a breakpoint at? Obviously this
works for unix and dos file formats. But from the algorithm you stated
above, it doesn't look like GDB will work with mac file formats.

I mean, the C library on unix won't be able to read a file that was
created on a mac (at least with the mac file format).

Is GDB responsible for mapping the file line numbers to the actual lines?
or is this the responsibility of GCC via the debug info? For instance,
if foo () is defined at line 100 according to gcc and 101 according to
GDB, does CGDB have to think foo () is at line 100 or 101?

Thanks,
Bob Rossi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06  0:14   ` Bob Rossi
@ 2006-04-06  1:17     ` Daniel Jacobowitz
  2006-04-06  3:27       ` Bob Rossi
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Jacobowitz @ 2006-04-06  1:17 UTC (permalink / raw)
  To: gdb

On Wed, Apr 05, 2006 at 08:14:55PM -0400, Bob Rossi wrote:
> OK, so now I'm confused. If the user looks at the text file through my
> viewer, and set's a breakpoint at line 100, how can I be sure it's the
> same 100 that GDB will actually set a breakpoint at? Obviously this
> works for unix and dos file formats. But from the algorithm you stated
> above, it doesn't look like GDB will work with mac file formats.
> 
> I mean, the C library on unix won't be able to read a file that was
> created on a mac (at least with the mac file format).

The manual algorithm is only used while searching.  It will work on any
file format recognized by the C library.

The C library is generally used.  If you want to handle a text file,
including a source file, it had better be in the native format.  Full
stop.

I don't think any recent version of MacOS uses the old \r format,
anyway?  I thought OSX had switched to the Unix convention.

> Is GDB responsible for mapping the file line numbers to the actual lines?
> or is this the responsibility of GCC via the debug info? For instance,
> if foo () is defined at line 100 according to gcc and 101 according to
> GDB, does CGDB have to think foo () is at line 100 or 101?

I have no idea what you mean.  GDB gets line numbers from debug info,
of course; where else would it get them?

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06  1:17     ` Daniel Jacobowitz
@ 2006-04-06  3:27       ` Bob Rossi
  2006-04-06  3:35         ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Bob Rossi @ 2006-04-06  3:27 UTC (permalink / raw)
  To: gdb

On Wed, Apr 05, 2006 at 09:17:32PM -0400, Daniel Jacobowitz wrote:
> On Wed, Apr 05, 2006 at 08:14:55PM -0400, Bob Rossi wrote:
> > OK, so now I'm confused. If the user looks at the text file through my
> > viewer, and set's a breakpoint at line 100, how can I be sure it's the
> > same 100 that GDB will actually set a breakpoint at? Obviously this
> > works for unix and dos file formats. But from the algorithm you stated
> > above, it doesn't look like GDB will work with mac file formats.
> > 
> > I mean, the C library on unix won't be able to read a file that was
> > created on a mac (at least with the mac file format).
> 
> The manual algorithm is only used while searching.  It will work on any
> file format recognized by the C library.
> 
> The C library is generally used.  If you want to handle a text file,
> including a source file, it had better be in the native format.  Full
> stop.
> 
> I don't think any recent version of MacOS uses the old \r format,
> anyway?  I thought OSX had switched to the Unix convention.

Sure, but for some reason the file '/usr/include/g++-3/sstream' on red
hat enterprise 3 has mixed file formats. So, this becomes a practical
issue, not just a theoretical one. Here is an example from 
'od -a /usr/include/g++-3/sstream
0007400   h   )  cr  nl  sp  sp  sp  sp   {  sp   }  cr  nl  nl  sp  sp
The 'cr nl' is "\r\n" and the extra "nl" is "\n". Opening this file in
vim shows the ^M at the end of each line.

> > Is GDB responsible for mapping the file line numbers to the actual lines?
> > or is this the responsibility of GCC via the debug info? For instance,
> > if foo () is defined at line 100 according to gcc and 101 according to
> > GDB, does CGDB have to think foo () is at line 100 or 101?
> 
> I have no idea what you mean.  GDB gets line numbers from debug info,
> of course; where else would it get them?

Does the debug info actually say, "at line 100 symbol foo() exists?"
Meaning, does gcc calculate that information, or does GDB derive the
line number from the debug info?

OK, I'm sorry, as usual I'm describing this bad. Let's start out by
trying to agree with an assumption that I have. Since the file is in a
mixed format, is it true that the line number is determined ambiguously 
by the program reading the file?

Thanks,
Bob Rossi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06  3:27       ` Bob Rossi
@ 2006-04-06  3:35         ` Eli Zaretskii
  2006-04-06  5:06           ` Daniel Jacobowitz
  2006-04-06 14:01           ` Bob Rossi
  0 siblings, 2 replies; 23+ messages in thread
From: Eli Zaretskii @ 2006-04-06  3:35 UTC (permalink / raw)
  To: gdb

> Date: Wed, 5 Apr 2006 23:27:02 -0400
> From: Bob Rossi <bob_rossi@cox.net>
> 
> > I have no idea what you mean.  GDB gets line numbers from debug info,
> > of course; where else would it get them?
> 
> Does the debug info actually say, "at line 100 symbol foo() exists?"

No, it says, for every source line, which PC addresses correspond to
that source line.  That is all GDB needs to know, because it
manipulates PC addresses (i.e. addresses in the .text section).

For symbols, the debug info says that symbol `foo' is stored in the
.text or .data section (or .bss or something else) at address NNN.

> OK, I'm sorry, as usual I'm describing this bad. Let's start out by
> trying to agree with an assumption that I have. Since the file is in a
> mixed format, is it true that the line number is determined ambiguously 
> by the program reading the file?

Not necessarily, see my other message.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06  3:35         ` Eli Zaretskii
@ 2006-04-06  5:06           ` Daniel Jacobowitz
  2006-04-06 13:03             ` Daniel Jacobowitz
  2006-04-06 14:01           ` Bob Rossi
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel Jacobowitz @ 2006-04-06  5:06 UTC (permalink / raw)
  To: gdb, gdb

On Thu, Apr 06, 2006 at 06:35:41AM +0300, Eli Zaretskii wrote:
> > Does the debug info actually say, "at line 100 symbol foo() exists?"
> 
> No, it says, for every source line, which PC addresses correspond to
> that source line.  That is all GDB needs to know, because it
> manipulates PC addresses (i.e. addresses in the .text section).
> 
> For symbols, the debug info says that symbol `foo' is stored in the
> .text or .data section (or .bss or something else) at address NNN.

Well, this is true, but what Bob wrote is often true also.  One of them
is the line corresponding to the first PC instruction of the function;
the other is the line of declaration of the function, which may be
different (e.g. before the leading brace).  I don't remember offhand if
GDB takes advantage of the latter.

(One comes from DWARF .debug_line, the other from .debug_info
DW_AT_decl_line).

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06  5:06           ` Daniel Jacobowitz
@ 2006-04-06 13:03             ` Daniel Jacobowitz
  0 siblings, 0 replies; 23+ messages in thread
From: Daniel Jacobowitz @ 2006-04-06 13:03 UTC (permalink / raw)
  To: gdb, gdb

On Thu, Apr 06, 2006 at 06:35:41AM +0300, Eli Zaretskii wrote:
> > Does the debug info actually say, "at line 100 symbol foo() exists?"
> 
> No, it says, for every source line, which PC addresses correspond to
> that source line.  That is all GDB needs to know, because it
> manipulates PC addresses (i.e. addresses in the .text section).
> 
> For symbols, the debug info says that symbol `foo' is stored in the
> .text or .data section (or .bss or something else) at address NNN.

Well, this is true, but what Bob wrote is often true also.  One of them
is the line corresponding to the first PC instruction of the function;
the other is the line of declaration of the function, which may be
different (e.g. before the leading brace).  I don't remember offhand if
GDB takes advantage of the latter.

(One comes from DWARF .debug_line, the other from .debug_info
DW_AT_decl_line).

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06  3:35         ` Eli Zaretskii
  2006-04-06  5:06           ` Daniel Jacobowitz
@ 2006-04-06 14:01           ` Bob Rossi
  2006-04-06 14:41             ` Daniel Jacobowitz
  2006-04-06 19:07             ` Eli Zaretskii
  1 sibling, 2 replies; 23+ messages in thread
From: Bob Rossi @ 2006-04-06 14:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb

[-- Attachment #1: Type: text/plain, Size: 4095 bytes --]

On Thu, Apr 06, 2006 at 06:35:41AM +0300, Eli Zaretskii wrote:
> > Date: Wed, 5 Apr 2006 23:27:02 -0400
> > From: Bob Rossi <bob_rossi@cox.net>
> > 
> > > I have no idea what you mean.  GDB gets line numbers from debug info,
> > > of course; where else would it get them?
> > 
> > Does the debug info actually say, "at line 100 symbol foo() exists?"
> 
> No, it says, for every source line, which PC addresses correspond to
> that source line.  That is all GDB needs to know, because it
> manipulates PC addresses (i.e. addresses in the .text section).
> 
> For symbols, the debug info says that symbol `foo' is stored in the
> .text or .data section (or .bss or something else) at address NNN.

OK, this is interesting in brings up 2 cases. (They may be the same
though).

The first is when I have a source file displayed, I need to make sure
that what the user see's as line N is what GDB/GCC think is line N. For
instance, 'b foo.c:N' must be the same line N that GDB/GCC think is line N.

The second case is when the user types 'b main'.
GDB will find the symbol and determine the line number.
    (gdb) b main
    Breakpoint 1 at 0x8048320: file main.c, line 4.
I need to make sure that line 4, is the same in GDB, as it is in CGDB.

I've created a small example, using unix/dos/mac text formats.
The files are attached for your viewing purposes.

The output of cat
    $ cat unix.c
    int
    unixf (int i)
    {
      return i;
    }
    bar@adam ~/tmp/foo
    $ cat dos.c
    int
    dosf (int i)
    {
      return i;
    }
    bar@adam ~/tmp/foo
    $ cat mac.c
    bar@adam ~/tmp/foo

The output of GDB's list and source command.

    (gdb) list unix.c:1
    1       int
    2       unixf (int i)
    3       {
    4         return i;
    5       }
    (gdb) info source
    Current source file is unix.c
    Compilation directory is /home/bar/tmp/foo
    Located in /home/ADAM/bar/tmp/foo/unix.c
    Contains 5 lines.
    Source language is c.
    Compiled with unknown debugging format.
    Does not include preprocessor macro info.

    (gdb) list dos.c:1
    1       int
    2       dosf (int i)
    3       {
    4         return i;
    5       }
    (gdb) info source
    Current source file is dos.c
    Compilation directory is /home/bar/tmp/foo
    Located in /home/ADAM/bar/tmp/foo/dos.c
    Contains 5 lines.
    Source language is c.
    Compiled with unknown debugging format.
    Does not include preprocessor macro info.
    (gdb) list mac.c:1
    (gdb) rn i;)
    (gdb) info source
    Current source file is mac.c
    Compilation directory is /home/bar/tmp/foo
    Located in /home/ADAM/bar/tmp/foo/mac.c
    Contains 1 line.
    Source language is c.
    Compiled with unknown debugging format.
    Does not include preprocessor macro info.
    (gdb)

GDB writes every line to the current line when listing the mac file.
It is overwritten via the "\r". Notice that the 'info source' command
thinks the file is 1 line long. This isn't correct IMO. Is it to anyone
else?

The breakpoint command on symbols. GDB apparently thinks the macf
function is at line 4, but thinks there is only 1 line in the file.

    (gdb) b unixf
    Breakpoint 2 at 0x8048313: file unix.c, line 4.
    (gdb) b dosf
    Breakpoint 3 at 0x804831b: file dos.c, line 4.
    (gdb) b macf
    Breakpoint 4 at 0x8048323: file mac.c, line 4.

Executing the program.

    (gdb) b unixf
    Breakpoint 1 at 0x8048313: file unix.c, line 4.
    (gdb) b dosf
    Breakpoint 2 at 0x804831b: file dos.c, line 4.
    (gdb) b macf
    Breakpoint 3 at 0x8048323: file mac.c, line 4.
    (gdb) r

    Breakpoint 1, unixf (i=1) at unix.c:4
    4         return i;
    (gdb) c

    Breakpoint 2, dosf (i=1) at dos.c:4
    4         return i;
    (gdb)

    Breakpoint 3, macf (i=1) at mac.c:4
    Line number 4 out of range; mac.c has 1 lines.
    (gdb)

I don't know what this warning message means. Should CGDB think there is
5 lines in the file or 1 to be consitent with GDB or with GCC?

None of this really has anything to do with mixed file formats. That
step is even more confusing.

Thanks,
Bob Rossi

[-- Attachment #2: files.tar --]
[-- Type: application/x-tar, Size: 10240 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06 14:01           ` Bob Rossi
@ 2006-04-06 14:41             ` Daniel Jacobowitz
  2006-04-06 19:20               ` Eli Zaretskii
  2006-04-06 19:07             ` Eli Zaretskii
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel Jacobowitz @ 2006-04-06 14:41 UTC (permalink / raw)
  To: Eli Zaretskii, gdb

On Thu, Apr 06, 2006 at 09:38:29AM -0400, Bob Rossi wrote:
> OK, this is interesting in brings up 2 cases. (They may be the same
> though).

Why are you going to tremendous lengths to accomodate non-native
newline conventions?  Is there some good reason I've missed?  Your
example about the slightly mangled RHEL3 header doesn't cut it. That's
the least problematic form of mixing.  You'll just get stray
non-printables at the end of lines.

You can go to all the trouble you want, but the fact is, GDB only
supports files using the native convention.  So no wonder you can't
get it to match.  If you really need anything more, I recommend just
detecting the case and warning.

> GDB writes every line to the current line when listing the mac file.
> It is overwritten via the "\r". Notice that the 'info source' command
> thinks the file is 1 line long. This isn't correct IMO. Is it to anyone
> else?

That is a correct interpretation of a file containing only '\r' line
separators on a Unix platform.

>     Breakpoint 3, macf (i=1) at mac.c:4
>     Line number 4 out of range; mac.c has 1 lines.

That is GCC's interpretation of a file containing only '\r' separators
on a Unix platform.  It is also valid.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06 14:41             ` Daniel Jacobowitz
@ 2006-04-06 19:20               ` Eli Zaretskii
  2006-04-06 19:32                 ` Bob Rossi
  2006-04-06 20:55                 ` Paul Koning
  0 siblings, 2 replies; 23+ messages in thread
From: Eli Zaretskii @ 2006-04-06 19:20 UTC (permalink / raw)
  To: gdb

> Date: Thu, 6 Apr 2006 10:05:42 -0400
> From: Daniel Jacobowitz <drow@false.org>
> 
> Why are you going to tremendous lengths to accomodate non-native
> newline conventions?  Is there some good reason I've missed?

I don't know if that's Bob's reason, but one good reason is that
nowadays you can never know where (on what machine) the source files
live, and who edits them on what platform.  For example, some
developers are so used to Microsoft's Visual Studio that they use it
to edit sources to be compiled on Unix (via the network).

So it does make sense to support non-native formats, although adding
that to GDB would be a non-trivial job.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06 19:20               ` Eli Zaretskii
@ 2006-04-06 19:32                 ` Bob Rossi
  2006-04-06 23:55                   ` Daniel Jacobowitz
  2006-04-06 20:55                 ` Paul Koning
  1 sibling, 1 reply; 23+ messages in thread
From: Bob Rossi @ 2006-04-06 19:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb

On Thu, Apr 06, 2006 at 10:01:34PM +0300, Eli Zaretskii wrote:
> > Date: Thu, 6 Apr 2006 10:05:42 -0400
> > From: Daniel Jacobowitz <drow@false.org>
> > 
> > Why are you going to tremendous lengths to accomodate non-native
> > newline conventions?  Is there some good reason I've missed?
> 
> I don't know if that's Bob's reason, but one good reason is that
> nowadays you can never know where (on what machine) the source files
> live, and who edits them on what platform.  For example, some
> developers are so used to Microsoft's Visual Studio that they use it
> to edit sources to be compiled on Unix (via the network).
> 
> So it does make sense to support non-native formats, although adding
> that to GDB would be a non-trivial job.

This is exactly my reasoning. Usually I agree 100% with Daniel, but in
this circumstance, I just think it's wrong to say "sorry, your out of
luck" to the user.

Bob Rossi


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06 19:32                 ` Bob Rossi
@ 2006-04-06 23:55                   ` Daniel Jacobowitz
  2006-04-07 13:33                     ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Jacobowitz @ 2006-04-06 23:55 UTC (permalink / raw)
  To: Eli Zaretskii, gdb

On Thu, Apr 06, 2006 at 03:07:42PM -0400, Bob Rossi wrote:
> This is exactly my reasoning. Usually I agree 100% with Daniel, but in
> this circumstance, I just think it's wrong to say "sorry, your out of
> luck" to the user.

Native formats are the only sort we can support reliably.

We get line numbers from the debug information, which was produced by a
compiler - any compiler.  If the file is in native format, the compiler
can be presumed to have gotten the line numbers right.  If it isn't,
then they could be totally out of whack.  And the compiler could have
been run on a different platform.  Supporting all of the possible
combinations is simply impossible.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06 23:55                   ` Daniel Jacobowitz
@ 2006-04-07 13:33                     ` Eli Zaretskii
  0 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2006-04-07 13:33 UTC (permalink / raw)
  To: gdb

> Date: Thu, 6 Apr 2006 15:31:55 -0400
> From: Daniel Jacobowitz <drow@false.org>
> 
> Native formats are the only sort we can support reliably.
> 
> We get line numbers from the debug information, which was produced by a
> compiler - any compiler.  If the file is in native format, the compiler
> can be presumed to have gotten the line numbers right.  If it isn't,
> then they could be totally out of whack.

At least with GCC and with Unix and DOS style of EOLs, there's no
basis to assume that line numbers will be ``totally out of wack''.
The Mac case obviously is harder, but it sounds like that style is
dying anyway.

> And the compiler could have been run on a different platform.

GCC supports both Unix and DOS EOLs on Windows as well.  So, at least
for these two styles and platforms, it's possible to support line
numbers reliably.

> Supporting all of the possible combinations is simply impossible.

All of them might be impossible, but some of them could be quite
possible.

Anyway, unless we have a volunteer to add this kind of support to GDB,
this dispute is purely academic.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06 19:20               ` Eli Zaretskii
  2006-04-06 19:32                 ` Bob Rossi
@ 2006-04-06 20:55                 ` Paul Koning
  2006-04-07 11:54                   ` Eli Zaretskii
  1 sibling, 1 reply; 23+ messages in thread
From: Paul Koning @ 2006-04-06 20:55 UTC (permalink / raw)
  To: eliz; +Cc: gdb

>>>>> "Eli" == Eli Zaretskii <eliz@gnu.org> writes:

 >> Date: Thu, 6 Apr 2006 10:05:42 -0400 From: Daniel Jacobowitz
 >> <drow@false.org>
 >> 
 >> Why are you going to tremendous lengths to accomodate non-native
 >> newline conventions?  Is there some good reason I've missed?

 Eli> I don't know if that's Bob's reason, but one good reason is that
 Eli> nowadays you can never know where (on what machine) the source
 Eli> files live, and who edits them on what platform.  For example,
 Eli> some developers are so used to Microsoft's Visual Studio that
 Eli> they use it to edit sources to be compiled on Unix (via the
 Eli> network).

 Eli> So it does make sense to support non-native formats, although
 Eli> adding that to GDB would be a non-trivial job.

This is only an issue if the source control system doesn't cure it.
Subversion does, for example.

	   paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06 20:55                 ` Paul Koning
@ 2006-04-07 11:54                   ` Eli Zaretskii
  0 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2006-04-07 11:54 UTC (permalink / raw)
  To: Paul Koning; +Cc: gdb

> Date: Thu, 6 Apr 2006 15:20:51 -0400
> From: Paul Koning <pkoning@equallogic.com>
> Cc: gdb@sources.redhat.com
> 
>  Eli> So it does make sense to support non-native formats, although
>  Eli> adding that to GDB would be a non-trivial job.
> 
> This is only an issue if the source control system doesn't cure it.

I don't think we can rely on the assumption that an arbitrary source
file is under source control system.

> Subversion does, for example.

I don't think we can rely on the assumption that everyone uses
Subversion.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06 14:01           ` Bob Rossi
  2006-04-06 14:41             ` Daniel Jacobowitz
@ 2006-04-06 19:07             ` Eli Zaretskii
  1 sibling, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2006-04-06 19:07 UTC (permalink / raw)
  To: gdb

> Date: Thu, 6 Apr 2006 09:38:29 -0400
> From: Bob Rossi <bob_rossi@cox.net>
> Cc: gdb@sources.redhat.com
> 
> The first is when I have a source file displayed, I need to make sure
> that what the user see's as line N is what GDB/GCC think is line N. For
> instance, 'b foo.c:N' must be the same line N that GDB/GCC think is line N.

Then you must do _exactly_ what GDB does: support only native EOL
formats.

> The second case is when the user types 'b main'.
> GDB will find the symbol and determine the line number.
>     (gdb) b main
>     Breakpoint 1 at 0x8048320: file main.c, line 4.
> I need to make sure that line 4, is the same in GDB, as it is in CGDB.

This only works in GDB if the source has native EOLs.  You must do the
same in CGDB, or else change GDB to support non-native formats, and
make CGDB do the same.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-05 23:39 ` Daniel Jacobowitz
  2006-04-06  0:14   ` Bob Rossi
@ 2006-04-06  3:47   ` Eli Zaretskii
  2006-04-06  4:29     ` Daniel Jacobowitz
  1 sibling, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2006-04-06  3:47 UTC (permalink / raw)
  To: gdb

> Date: Wed, 5 Apr 2006 19:39:38 -0400
> From: Daniel Jacobowitz <drow@false.org>
> 
> GDB does something much simpler.  It opens the file in text mode and
> lets the C library sort it out.
> 
> Well, usually.  In search and reverse search it sometimes uses a
> similar but slightly simpler algorithm: ignore '\r' if followed by
> '\n'.  I'm not sure why those are done in binary mode.

I think it's because GDB counts characters and then lseeks to the
point it thinks it should display.  If the library's text-mode I/O
converts \r\n to \n, this seeks will only work reliably in binary
mode, since most DOS/Windows libraries don't seek to the correct place
(the only exception from this rule I know of is the DJGPP library).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06  3:47   ` Eli Zaretskii
@ 2006-04-06  4:29     ` Daniel Jacobowitz
  2006-04-06  4:30       ` Daniel Jacobowitz
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Jacobowitz @ 2006-04-06  4:29 UTC (permalink / raw)
  To: gdb, gdb

On Thu, Apr 06, 2006 at 06:47:42AM +0300, Eli Zaretskii wrote:
> > Well, usually.  In search and reverse search it sometimes uses a
> > similar but slightly simpler algorithm: ignore '\r' if followed by
> > '\n'.  I'm not sure why those are done in binary mode.
> 
> I think it's because GDB counts characters and then lseeks to the
> point it thinks it should display.  If the library's text-mode I/O
> converts \r\n to \n, this seeks will only work reliably in binary
> mode, since most DOS/Windows libraries don't seek to the correct place
> (the only exception from this rule I know of is the DJGPP library).

Oh, thanks.  That's surely it.

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06  4:29     ` Daniel Jacobowitz
@ 2006-04-06  4:30       ` Daniel Jacobowitz
  0 siblings, 0 replies; 23+ messages in thread
From: Daniel Jacobowitz @ 2006-04-06  4:30 UTC (permalink / raw)
  To: gdb, gdb

On Thu, Apr 06, 2006 at 06:47:42AM +0300, Eli Zaretskii wrote:
> > Well, usually.  In search and reverse search it sometimes uses a
> > similar but slightly simpler algorithm: ignore '\r' if followed by
> > '\n'.  I'm not sure why those are done in binary mode.
> 
> I think it's because GDB counts characters and then lseeks to the
> point it thinks it should display.  If the library's text-mode I/O
> converts \r\n to \n, this seeks will only work reliably in binary
> mode, since most DOS/Windows libraries don't seek to the correct place
> (the only exception from this rule I know of is the DJGPP library).

Oh, thanks.  That's surely it.

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-05 22:31 text file formats Bob Rossi
  2006-04-05 23:39 ` Daniel Jacobowitz
@ 2006-04-06  3:43 ` Eli Zaretskii
  2006-04-06 13:35   ` Bob Rossi
  1 sibling, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2006-04-06  3:43 UTC (permalink / raw)
  To: gdb

> Date: Wed, 5 Apr 2006 18:31:22 -0400
> From: Bob Rossi <bob_rossi@cox.net>
> 
> However, it is easy to mix these file formats. In this case, any particular 
> file can use any combination of "\r", "\r\n" and "\n" for newlines. I'm not 
> even sure how to display such a file. I'm guessing that's it's
> ambiguous, and i can make a best guess as to what the newline sequence
> should be. Is this correct?
> 
> One thing I have determined, is that in order to know what the file
> format is, the entire text file needs to be parsed. After that, either
> the file format is defined (unix/dos/mac) or it is undefined (mix of
> them).

(a) For native end-of-line (EOL) format, use the native C library and
    specify the text-mode I/O when you open the file.

(b) For non-native but consistent EOL format, read the file in binary
    mode, analyze its first chunk, and then manually convert the
    original EOL markers into literal \n.

The only two methods I know of to handle the mixed case are:

  (1) Fall back to Unix-style EOL and show the ^M literally.
  (2) Let the user specify the EOL and then apply the (b) strategy
      above.

> I would like to make sure that the algorithm CGDB uses to determine
> the line number from a file is the same algorithm that GDB uses.

GDB doesn't solve any of these problems.  But I think that your
motivation for doing the same as GDB was based on incorrect
assumptions, see Daniel's and my responses elsewhere in this thread.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06  3:43 ` Eli Zaretskii
@ 2006-04-06 13:35   ` Bob Rossi
  2006-04-06 19:01     ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Bob Rossi @ 2006-04-06 13:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb

On Thu, Apr 06, 2006 at 06:43:48AM +0300, Eli Zaretskii wrote:
> > Date: Wed, 5 Apr 2006 18:31:22 -0400
> > From: Bob Rossi <bob_rossi@cox.net>
> > 
> > However, it is easy to mix these file formats. In this case, any particular 
> > file can use any combination of "\r", "\r\n" and "\n" for newlines. I'm not 
> > even sure how to display such a file. I'm guessing that's it's
> > ambiguous, and i can make a best guess as to what the newline sequence
> > should be. Is this correct?
> > 
> > One thing I have determined, is that in order to know what the file
> > format is, the entire text file needs to be parsed. After that, either
> > the file format is defined (unix/dos/mac) or it is undefined (mix of
> > them).
> 
> (a) For native end-of-line (EOL) format, use the native C library and
>     specify the text-mode I/O when you open the file.
> 
> (b) For non-native but consistent EOL format, read the file in binary
>     mode, analyze its first chunk, and then manually convert the
>     original EOL markers into literal \n.

OK, that's fine, except, you don't know if the file is native/non-native
EOL until you open it and process the entire file.

> The only two methods I know of to handle the mixed case are:
> 
>   (1) Fall back to Unix-style EOL and show the ^M literally.

OK.
>   (2) Let the user specify the EOL and then apply the (b) strategy
>       above.

OK, that's fine, but is this what GDB, GCC do?

Bob Rossi


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: text file formats
  2006-04-06 13:35   ` Bob Rossi
@ 2006-04-06 19:01     ` Eli Zaretskii
  0 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2006-04-06 19:01 UTC (permalink / raw)
  To: gdb

> Date: Thu, 6 Apr 2006 09:15:14 -0400
> From: Bob Rossi <bob_rossi@cox.net>
> Cc: gdb@sources.redhat.com
> 
> > (a) For native end-of-line (EOL) format, use the native C library and
> >     specify the text-mode I/O when you open the file.
> > 
> > (b) For non-native but consistent EOL format, read the file in binary
> >     mode, analyze its first chunk, and then manually convert the
> >     original EOL markers into literal \n.
> 
> OK, that's fine, except, you don't know if the file is native/non-native
> EOL until you open it and process the entire file.

You do know that if all you want to handle is the native format.

If you want to handle non-native formats as well, you must do (b).

> > The only two methods I know of to handle the mixed case are:
> > 
> >   (1) Fall back to Unix-style EOL and show the ^M literally.
> 
> OK.
> >   (2) Let the user specify the EOL and then apply the (b) strategy
> >       above.
> 
> OK, that's fine, but is this what GDB, GCC do?

No, that's what Emacs does.  Daniel told you what GDB does.  As for
GCC, I simply don't know, but I think it does handle DOS-style CR-LF
EOLs on non-Windows machines.  Not sure about the (old) Mac style.


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2006-04-07  8:18 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-05 22:31 text file formats Bob Rossi
2006-04-05 23:39 ` Daniel Jacobowitz
2006-04-06  0:14   ` Bob Rossi
2006-04-06  1:17     ` Daniel Jacobowitz
2006-04-06  3:27       ` Bob Rossi
2006-04-06  3:35         ` Eli Zaretskii
2006-04-06  5:06           ` Daniel Jacobowitz
2006-04-06 13:03             ` Daniel Jacobowitz
2006-04-06 14:01           ` Bob Rossi
2006-04-06 14:41             ` Daniel Jacobowitz
2006-04-06 19:20               ` Eli Zaretskii
2006-04-06 19:32                 ` Bob Rossi
2006-04-06 23:55                   ` Daniel Jacobowitz
2006-04-07 13:33                     ` Eli Zaretskii
2006-04-06 20:55                 ` Paul Koning
2006-04-07 11:54                   ` Eli Zaretskii
2006-04-06 19:07             ` Eli Zaretskii
2006-04-06  3:47   ` Eli Zaretskii
2006-04-06  4:29     ` Daniel Jacobowitz
2006-04-06  4:30       ` Daniel Jacobowitz
2006-04-06  3:43 ` Eli Zaretskii
2006-04-06 13:35   ` Bob Rossi
2006-04-06 19:01     ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox