From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6195 invoked by alias); 11 Nov 2002 12:14:10 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 6183 invoked from network); 11 Nov 2002 12:14:06 -0000 Received: from unknown (HELO mx1.redhat.com) (66.187.233.31) by sources.redhat.com with SMTP; 11 Nov 2002 12:14:06 -0000 Received: from int-mx2.corp.redhat.com (nat-pool-rdu-dmz.redhat.com [172.16.52.200]) by mx1.redhat.com (8.11.6/8.11.6) with ESMTP id gABBp5w13381 for ; Mon, 11 Nov 2002 06:51:05 -0500 Received: from potter.sfbay.redhat.com (potter.sfbay.redhat.com [172.16.27.15]) by int-mx2.corp.redhat.com (8.11.6/8.11.6) with ESMTP id gABCDxx20571 for ; Mon, 11 Nov 2002 07:14:00 -0500 Received: from cygbert.vinschen.de (vpn50-37.rdu.redhat.com [172.16.50.37]) by potter.sfbay.redhat.com (8.11.6/8.11.6) with ESMTP id gABCDuH16042 for ; Mon, 11 Nov 2002 04:13:57 -0800 Received: (from corinna@localhost) by cygbert.vinschen.de (8.11.6/8.9.3/Linux sendmail 8.9.3) id gABCDsL25218 for gdb@sources.redhat.com; Mon, 11 Nov 2002 13:13:54 +0100 Date: Mon, 11 Nov 2002 04:14:00 -0000 From: Corinna Vinschen To: gdb@sources.redhat.com Subject: [RFC] File-I/O, target access to host file system via gdb remote protocol enhancement Message-ID: <20021111131354.N10395@cygbert.vinschen.de> Reply-To: gdb@sources.redhat.com Mail-Followup-To: gdb@sources.redhat.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.22.1i X-SW-Source: 2002-11/txt/msg00107.txt.bz2 Hi, this RFC tries to introduce a remote protocol enhancement, which already has been implemented at Red Hat. The idea is to allow the remote target (which likely has no own file system at all) to access the host file system to store and retrieve data from a gdb session, as if the hosts filesystem is local to the target. Basically this means, the gdb stub on the target translates calls to low level IO routines as open, read, write, close into calls to gdb on the host machine, which in turn calls this routines locally to support the target. A second part of this implementation is to map the basic stdio streams (file descriptors 0-2) to the gdb console, to enable the user to serve a remote target application interactively. This should work in gdb's CLI as well as in the GUI. The existing implementation maps only a handful of useful functions but the protocoll itself is easily expandable to support a lot more. The wish is, to contribute the File-I/O enhancement to the FSF. The official text follows. Thanks in advance, Corinna ========================================================================== Abstract: File I/O shall allow the target to use the hosts file system and console I/O when calling various system calls. For that reason, system calls on the target system will get translated into a remote communication to the host system which then performs the needed actions and returns with an adequate response to the target system. This simulates file system operations even on file system-less targets. Since remote communication between GDB and a target system is already well defined, the file I/O protocol will be part of this already existing GDB remote serial protocol. Requirements: The protocol should be host- and target-system independent. The protocol can't expect that values, used to control the exact behaviour of system calls, or datatypes are identical on host and target. This requires the protocol to use an independent representation of datatypes and values. It's in the responsibility of both connection points (Redboot on the target, GDB on the host) to translate the system dependent values into the unified protocol values when data is transmitted. The communication is synchronous. A system call is possible only when GDB is waiting for the continuing or stepping target. While GDB handles the request for a syscall, the target is stopped to allow deterministic access to the target's memory. Therefore file I/O is not interuptible by target signals. It is possible to interrupt file I/O by a user interrupt (Ctrl-C), though. The target's request to perform a host system call does not finish the latest action. That means, after finishing the system call, the target returns to continuing the previous activity (continue, step). No additional continue or step request from GDB is required: (gdb) continue <- target requests 'syscall X' target is stopped, GDB executes syscall -> GDB returns result ... target continues, GDB returns to wait for the target <- target hits breakpoint and send a Txx packet The protocol is only used for files on the host file system and for I/O on the console. Character or block special devices, pipes, named pipes or sockets or any other communication method on the host system are not supported by this protocol. Protocol basics: The file I/O protocol is part of the already existing GDB remote serial protocol. It uses the not yet used 'F' packet type for the communication. Since a file I/O system call can only occur when GDB is waiting for the continuing or stepping target, the file I/O request is a new reply that GDB has to expect as a result of a former 'c', 'C', 's' or 'S' packet. This 'F' packet contains all information needed to allow GDB to call the appropriate host system call. This especially includes: - A unique identifier for the requested syscall. - All parameters to the syscall. Pointers are given as addresses into the target memory. Pointers to strings are given as pointer/ length pair. Numerical values are given as they are. Numerical control values are given in the protocol specific representation. At that point GDB has to perform the following actions. - If parameter pointer values are given, which point to data needed as input to a system call, GDB requests this data from the target with a standard 'm' packet request. This additional communication has to be expected by the target implementation and is handled as any other 'm' packet communication. - Translating all values from protocol representation to host representation as needed. Datatypes are coerced into the host types. - Call syscall. - Coerce datatypes back to protocol representation. - If pointer parameters in the request packet point to buffer space in which a system call is expected to copy data to, the data is transmitted to the target using a 'M' packet. This packet has to be expected by the target implementation and is handled as any other 'M' packet communication. Eventually GDB replies with another 'F' packet which contains all necessary information for the target to continue. This at least contains - Return value. - Errno, if has been changed by the system call. - "Ctrl-C" flag. After having done the needed type and value coercion, the target continues the latest continue or step action. Memory transfer: Structured data which is transferred using a memory read or write packet as e.g. a struct stat is expected to be in a protocol specific format with all numerical multibyte datatypes being big endian. This should be done by the target before the 'F' packet is sent resp. by GDB before it transfers memory to the target. Transferred pointers to structured data should point to the already coerced data at any time. The "Ctrl-C" message: A special case is, if the "Ctrl-C" flag is set in the GDB reply packet. In this case the target should behave, as if it had gotten a break message. The meaning for the target is "system call interupted by SIGINT". Consequentially, the target should actually stop (as with a break message) and return to GDB with a "T02" packet. In this case, it's important for the target to know, in which state the system call was interrupted. Since this action is by design not an atomic operation, we have to differ between two cases. - The syscall hasn't been performed on the host yet. - The syscall on the host has been finished. These two states can be distinguished by the target by the value of the returned errno. If it's the protocol representation of EINTR, the syscall hasn't been performed. This is equivalent to the EINTR handling on POSIX systems. In any other case, the target may presume that the syscall has been finished -- successful or not -- and should behave as if the break message arrived right after the syscall. IMPORTANT: GDB must behave reliable. If the system call has not been called yet, GDB may send the 'F' reply immediately, setting EINTR as errno in the packet. If the system call on the host has been finished before the user requests a break, the full action must be finshed by GDB. This requires sending 'M' packets as they fit. The 'F' packet may only be send when either nothing has happened or the full action has been completed. The 'F' request packet: The 'F' request packet has the following format: F[,]... is the identifier which says which host system call should be called. This is just the name of the function as listed in Appendix A. Parameters are hexadecimal integer values, either the real values or pointers to target buffer space. These are appended to the call-id, each separated from it's predecessor by a comma. All values are transmitted in their ASCII string representation, conforming to the following regular expression [+-]?[0-9a-fA-F]+ The 'F' reply packet: The 'F' reply packet has the following format: F[,[,]][;] The call specific attachment isn't used in this first proposal but it's designated to allow extensions needed by special not yet defined calls. No contents are defined yet. The parameters have to be transmitted as hexadecimal ASCII strings as described in the previous chapter. is the return code of the call as hexadecimal value. is the errno set by the call, in protocol specific representation. This parameter can be omitted if the call was successful. is only send if the user requested a break. In this case, the errno must be send as well, even if the call was successful. The Ctrl-C flag itself consists of the character 'C': F0,0,C or, if the call was interupted before the host call has been performed: F-1,4,C assuming 4 is the protocol specific representation of EINTR. Console I/O: By default and if not explicitely closed by the target system, the file descriptors 0, 1 and 2 are connected to the GDB console. Output on the GDB console is handled as any other file output operation (write(1,...) or write(2,...)). Console input is handled by GDB so that after the target read request from file descriptor 0 all following typing is buffered until either one of the following conditions is met: - The user presses Ctrl-C. The behaviour is as explained above, the read() system call is treated as finished. - The user presses . This is treated as end of input with a trailing line feed. - The user presses Ctrl-D. This is treated as end of input. No trailing character, especially no Ctrl-D is appended to the input. If the user has typed more characters as fit in the buffer given to the read call, the trailing characters are buffered in GDB until either another read(0,...) is requested by the target or debugging is stopped on users request. The "isatty" call: A special case in this protocol is the library call isatty(3) which is implemented as it's own call inside of this protocol. It returns 1 to the target if the file descriptor given as parameter is attached to the GDB console, 0 otherwise. Implementing through system calls would require implementing ioctl() and would be more complex than needed. The "system" call: The other special case in this protocol is the system(3) call which is implemented as it's own call, too. GDB is taking over the full task of calling the necessary host calls to perform the system() call. The return value of system is simplified before it's returned to the target. Basically, the only signal transmitted back is EINTR in case the user pressed Ctrl-C. Otherwise the return value consists entirely of the exit status of the called command. Appendix A: List of calls. All constants are given in their POSIX notation. The usage inside of protocol packets requires translation from host/target representation into protocol representation. The values of these constans are given in Appendix C. The protocol representation of the used datatypes is given in Appendix B. A.1 open Call-Id: open Synopsis: int open(const char *pathname, int flags); int open(const char *pathname, int flags, mode_t mode); Request: Fopen,pathptr/len,flags,mode `flags' is the bitwise or of the following values: O_CREAT If the file does not exist it will be created. The host rules apply as far as file ownership and time stamps are concerned. O_EXCL When used with O_CREAT, if the file already exists it is an error and open() fails. O_TRUNC If the file already exists and the open mode allows writing (O_RDWR or O_WRONLY is given) it will be truncated to length 0. O_APPEND The file is opened in append mode. O_RDONLY The file is opened for reading only. O_WRONLY The file is opened for writing only. O_RDWR The file is opened for reading and writing. Each other bit is silently ignored. `mode' is the bitwise or of the following values: S_IRUSR User has read permission. S_IWUSR User has write permission. S_IRGRP Group has read permission. S_IWGRP Group has write permission. S_IROTH Others have read permission. S_IWOTH Others have write permission. Each other bit is silently ignored. Return value: open returns the new file descriptor or -1 if an error occured. Errors: EEXIST pathname already exists and O_CREAT and O_EXCL were used. EISDIR pathname refers to a directory. EACCES The requested access is not allowed. ENAMETOOLONG pathname was too long. ENOENT A directory component in pathname does not exist. ENODEV pathname refers to a device, pipe, named pipe or socket. EROFS pathname refers to a file on a read-only filesystem and write access was requested. EFAULT pathname is an invalid pointer value. ENOSPC No space on device to create the file. EMFILE The process already has the maximum number of files open. ENFILE The limit on the total number of files open on the system has been reached. EINTR The call was interrupted by the user. A.2 close Call-Id: close Synopsis: int close(int fd); Request: Fclose,fd Return value: close returns zero on success, or -1 if an error occurred. Errors: EBADF fd isn't a valid open file descriptor. EINTR The call was interrupted by the user. A.3 read Call-Id: read Synopsis: int read(int fd, void *buf, unsigned int count); Request: Fread,fd,bufptr,count Return value: On success, the number of bytes read is returned. Zero indicates end of file. If count is zero, read returns zero as well. On error, -1 is returned. Errors: EBADF fd is not a valid file descriptor or is not open for reading. EFAULT buf is an invalid pointer value. EINTR The call was interrupted by the user. A.4 write Call-Id: write Synopsis: int write(int fd, const void *buf, unsigned int count); Request: Fwrite,fd,bufptr,count Return value: On success, the number of bytes written are returned. Zero indicates nothing was written. On error, -1 is returned. Errors: EBADF fd is not a valid file descriptor or is not open for writing. EFAULT buf is an invalid pointer value. EFBIG An attempt was made to write a file that exceeds the host specific maximum file size allowed. ENOSPC No space on device to write the data. EINTR The call was interrupted by the user. A.5 lseek Call-Id: lseek Synopsis: long lseek (int fd, long offset, int flag); Request: Flseek,fd,offset,flag `flag' is one of: SEEK_SET The offset is set to offset bytes. SEEK_CUR The offset is set to its current location plus offset bytes. SEEK_END The offset is set to the size of the file plus offset bytes. Return value: On success, the resulting unsigned offset in bytes from the beginning of the file is returned. Otherwise, a value of -1 is returned. Errors: EBADF fd is not a valid open file descriptor. ESPIPE fd is associated with the GDB console. EINVAL flag is not a proper value. EINTR The call was interrupted by the user. A.6 rename Call-Id: rename Synopsis: int rename(const char *oldpath, const char *newpath); Request: Frename,oldpathptr/len,newpathptr/len Return value: On success, zero is returned. On error, -1 is returned. Errors: EISDIR newpath is an existing directory, but oldpath is not a directory. EEXIST newpath is a non-empty directory. EBUSY oldpath or newpath is a directory that is in use by some process. EINVAL An attempt was made to make a directory a subdirectory of itself. ENOTDIR A component used as a directory in oldpath or new path is not a directory. Or oldpath is a directory and newpath exists but is not a directory. EFAULT oldpathptr or newpathptr are invalid pointer values. EACCES No access to the file or the path of the file. ENAMETOOLONG oldpath or newpath was too long. ENOENT A directory component in oldpath or newpath does not exist. EROFS The file is on a read-only filesystem. ENOSPC The device containing the file has no room for the new directory entry. EINTR The call was interrupted by the user. A.7 unlink Call-Id: unlink Synopsis: int unlink(const char *pathname); Request: Funlink,pathnameptr/len Return value: On success, zero is returned. On error, -1 is returned. Errors: EACCES No access to the file or the path of the file. EPERM The system does not allow unlinking of directories. EBUSY The file pathname cannot be unlinked because it's being used by another process. EFAULT pathnameptr is an invalid pointer value. ENAMETOOLONG pathname was too long. ENOENT A directory component in pathname does not exist. ENOTDIR A component of the path is not a directory. EROFS The file is on a read-only filesystem. EINTR The call was interrupted by the user. A.8 stat, fstat Call-Id: stat, fstat Synopsis: int stat(const char *pathname, struct stat *buf); int fstat(int fd, struct stat *buf); Request: Fstat,pathnameptr/len,bufptr Ffstat,fd,bufptr Return value: On success, zero is returned. On error, -1 is returned. Errors: EBADF fd is not a valid open file. ENOENT A directory component in pathname does not exist or the path is an empty string. ENOTDIR A component of the path is not a directory. EFAULT pathnameptr is an invalid pointer value. EACCES No access to the file or the path of the file. ENAMETOOLONG pathname was too long. EINTR The call was interrupted by the user. A.9 gettimeofday Call-Id: gettimeofday Synopsis: int gettimeofday(struct timeval *tv, void *tz); Request: Fgettimeofday,tvptr,tzptr Return value: On success, 0 is returned, -1 otherwise. Errors: EINVAL tz is a non-NULL pointer. EFAULT tvptr and/or tzptr is an invalid pointer value. A.10 isatty Call-Id: isatty Synopsis: int isatty(int fd); Request: Fisatty,fd Return value: Returns 1 if fd refers to the GDB console, 0 otherwise. Errors: EINTR The call was interrupted by the user. A.11 system Call-Id: system Synopsis: int system(const char *command); Request: Fsystem,commandptr/len Return value: The value returned is -1 on error and the return status of the command otherwise. Only the exit status of the command is returned, which is extracted from the hosts system return value by calling WEXITSTATUS(retval). In case /bin/sh could not be executed, 127 is returned. Errors: EINTR The call was interrupted by the user. Appendix B: Protocol specific representation of datatypes. B.1 Integral datatypes The integral datatypes used in the system calls are int, unsigned int, long, unsigned long, mode_t and time_t. Int, unsigned int, mode_t and time_t are implemented as 32 bit values in this protocol. Long and unsigned long are implemented as 64 bit types. To allow range checking on host and target, corresponding MIN and MAX values (similar to those in limits.h) are defined in Appendix C. B.2 Pointer values Pointers to target data is transmitted as they are. A difference is made for pointers to buffers for which the length isn't transmitted as part of the function call, namely strings. Strings are transmitted as a pointer/length pair, both as hex values, e. g. 1aaf/12 which is a pointer to data of length 18 bytes at position 0x1aaf. The length is defined as the full string length in bytes, including the trailing null byte. Example: "hello, world" at address 0x123456 is transmitted as 123456/d B.3 struct stat The buffer of type struct stat used by the target and GDB is defined as follows: struct stat { unsigned int st_dev; /* device */ unsigned int st_ino; /* inode */ mode_t st_mode; /* protection */ unsigned int st_nlink; /* number of hard links */ unsigned int st_uid; /* user ID of owner */ unsigned int st_gid; /* group ID of owner */ unsigned int st_rdev; /* device type (if inode device) */ unsigned long st_size; /* total size, in bytes */ unsigned long st_blksize; /* blocksize for filesystem I/O */ unsigned long st_blocks; /* number of blocks allocated */ time_t st_atime; /* time of last access */ time_t st_mtime; /* time of last modification */ time_t st_ctime; /* time of last change */ }; The integral datatypes are conforming to the definition in B.1 so this structure is of size 64 bytes. The values of several fields have a restricted meaning and/or range of values. st_dev: 0 file 1 console st_ino: No valid meaning for the target. Transmitted unchanged. st_mode: Valid mode bits are described in Appendix C. Any other bits have currently no meaning for the target. st_uid: No valid meaning for the target. Transmitted unchanged. st_gid: No valid meaning for the target. Transmitted unchanged. st_rdev: No valid meaning for the target. Transmitted unchanged. st_atime, st_mtime, st_ctime: These values have a host and file system dependent accuracy. Especially on Windows hosts the file systems don't support exact timing values. The target gets a struct stat of the above representation and is responsible to coerce it to the target representation before continuing. Note that due to size differences between the host and target representation of stat members, these members could eventually get truncated on the target. B.4 struct timeval The buffer of type struct timeval used by the target and GDB is defined as follows: struct timeval { time_t tv_sec; /* second */ long tv_usec; /* microsecond */ }; The integral datatypes are conforming to the definition in B.1 so this structure is of size 8 bytes. Appendix C: Constants The following values are used for the constants inside of the protocol. GDB and target are resposible to translate these values before and after the call as needed. C.1 Open flags All values are given in hexadecimal representation. O_RDONLY 0 O_WRONLY 1 O_RDWR 2 O_APPEND 8 O_CREAT 200 O_TRUNC 400 O_EXCL 800 C.2 mode_t values All values are given in octal representation. S_IFREG 100000 S_IFDIR 40000 S_IRUSR 400 S_IWUSR 200 S_IXUSR 100 S_IRGRP 40 S_IWGRP 20 S_IXGRP 10 S_IROTH 4 S_IWOTH 2 S_IXOTH 1 C.3 Errno values All values are given in decimal representation. EPERM 1 ENOENT 2 EINTR 4 EBADF 9 EACCES 13 EFAULT 14 EBUSY 16 EEXIST 17 ENODEV 19 ENOTDIR 20 EISDIR 21 EINVAL 22 ENFILE 23 EMFILE 24 EFBIG 27 ENOSPC 28 ESPIPE 29 EROFS 30 ENAMETOOLONG 91 EUNKNOWN 9999 EUNKNOWN is used as a fallback error value if a host system returns any error value not in the list of supported error numbers. C.4 Lseek flags SEEK_SET 0 SEEK_CUR 1 SEEK_END 2 C.5 Limits INT_MIN -2147483648 INT_MAX 2147483647 UINT_MAX 4294967295 LONG_MIN -9223372036854775808 LONG_MAX 9223372036854775807 ULONG_MAX 18446744073709551615 Appendix D: GDB setting for system(3) Due to security concerns about always allowing to call `system(3)' on the host, GDB gets an additional setting. The user has to explicitely allow the system(3) call in the user interface. Otherwise the system(3) call will fail and the target receives an error code EPERM. The setting is done using the following syntax: set remote system-call-allowed VAL with VAL being 0 or 1 for disaallowing resp. allowing the system(3) call. The user can view the setting by calling show remote system-call-allowed Appendix E: Examples In the examples below, `<-' and `->' are used to indicate transmitted and received data from GDB's point of view. E.1 write call <- Fwrite,3,1234,6 <== fd=3, bufptr=0x1234, len=6 -> m1234,6 <== read memory from target <- XXXXXX > F6 <== return "6 bytes written" E.2 read call <- Fread,3,1234,6 <== fd=3, bufptr=0x1234, len=6 -> M1234,6,XXXXXX <== write syscall result into... <- OK <== target's memory -> F6 <== return "6 bytes read" E.3 read call, call fails on the host due to invalid file descriptor <- Fread,3,1234,6 -> F-1,16 <== EINVAL E.4 read call, writing data on target fails <- Fread,3,1234,6 -> M1234,6,XXXXXX <- Ee -> F-1,e E.5 read call, user presses Ctrl-C before syscall on host is called <- Fread,3,1234,6 ... <== M request or not, depends on user -> F-1,4,C <- T02 E.6 read call, user presses Ctrl-C after syscall on host is called <- Fread,3,1234,6 -> M1234,6,XXXXXX <- XXXXXX -> F-1,4,C <- T02 =========================================================================== -- Corinna Vinschen Cygwin Developer Red Hat, Inc. mailto:vinschen@redhat.com