From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25956 invoked by alias); 6 Apr 2006 13:15:13 -0000 Received: (qmail 25948 invoked by uid 22791); 6 Apr 2006 13:15:12 -0000 X-Spam-Check-By: sourceware.org Received: from eastrmmtao01.cox.net (HELO eastrmmtao01.cox.net) (68.230.240.38) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 06 Apr 2006 13:15:11 +0000 Received: from localhost.localdomain ([68.9.66.48]) by eastrmmtao01.cox.net (InterMail vM.6.01.05.02 201-2131-123-102-20050715) with ESMTP id <20060406131504.ZSLT3988.eastrmmtao01.cox.net@localhost.localdomain>; Thu, 6 Apr 2006 09:15:04 -0400 Received: from bob by localhost.localdomain with local (Exim 4.52) id 1FRUKQ-0007ER-6W; Thu, 06 Apr 2006 09:15:14 -0400 Date: Thu, 06 Apr 2006 13:35:00 -0000 From: Bob Rossi To: Eli Zaretskii Cc: gdb@sources.redhat.com Subject: Re: text file formats Message-ID: <20060406131514.GH11610@brasko.net> Mail-Followup-To: Eli Zaretskii , gdb@sources.redhat.com References: <20060405223122.GB11610@brasko.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.9i X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2006-04/txt/msg00056.txt.bz2 On Thu, Apr 06, 2006 at 06:43:48AM +0300, Eli Zaretskii wrote: > > Date: Wed, 5 Apr 2006 18:31:22 -0400 > > From: Bob Rossi > > > > However, it is easy to mix these file formats. In this case, any particular > > file can use any combination of "\r", "\r\n" and "\n" for newlines. I'm not > > even sure how to display such a file. I'm guessing that's it's > > ambiguous, and i can make a best guess as to what the newline sequence > > should be. Is this correct? > > > > One thing I have determined, is that in order to know what the file > > format is, the entire text file needs to be parsed. After that, either > > the file format is defined (unix/dos/mac) or it is undefined (mix of > > them). > > (a) For native end-of-line (EOL) format, use the native C library and > specify the text-mode I/O when you open the file. > > (b) For non-native but consistent EOL format, read the file in binary > mode, analyze its first chunk, and then manually convert the > original EOL markers into literal \n. OK, that's fine, except, you don't know if the file is native/non-native EOL until you open it and process the entire file. > The only two methods I know of to handle the mixed case are: > > (1) Fall back to Unix-style EOL and show the ^M literally. OK. > (2) Let the user specify the EOL and then apply the (b) strategy > above. OK, that's fine, but is this what GDB, GCC do? Bob Rossi