From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 99490 invoked by alias); 24 Nov 2017 12:54:39 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 99480 invoked by uid 89); 24 Nov 2017 12:54:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=BAYES_00,KB_WAM_FROM_NAME_SINGLEWORD,SPF_HELO_PASS,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=owning, ultimate, franca, magical X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 24 Nov 2017 12:54:37 +0000 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 984091214D; Fri, 24 Nov 2017 12:54:36 +0000 (UTC) Received: from [127.0.0.1] (ovpn04.gateway.prod.ext.ams2.redhat.com [10.39.146.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id AC12260465; Fri, 24 Nov 2017 12:54:34 +0000 (UTC) Subject: Re: [RFA] C++-ify parse_format_string To: Simon Marchi , Simon Marchi , Tom Tromey , gdb-patches@sourceware.org References: <20171123164631.11055-1-tom@tromey.com> <0350cb5f-d176-76c4-11e8-5ffb3fe8c84d@redhat.com> From: Pedro Alves Message-ID: Date: Fri, 24 Nov 2017 12:54:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2017-11/txt/msg00596.txt.bz2 On 11/24/2017 03:17 AM, Simon Marchi wrote: > On 2017-11-23 05:40 PM, Pedro Alves wrote: >> On 11/23/2017 09:13 PM, Simon Marchi wrote: >> >>> I have a patch in some branch that does essentially the same thing, so >>> I was able to compare our approaches. In my version, I removed the >>> big allocation that is shared among pieces, and made each piece have >>> its own std::string. Unless we want to keep the current allocation >>> scheme for performance/memory usage reasons, I think that using >>> std::strings simplifies things in the parse_format_string function. >>> The format_pieces structure is replaced with an std::vector of >>> format_piece. >> >> Sounds like a step backwards to me. If it simplifies things, then >> it sounds like it might be because we're missing some utility, >> like string_view or something like that. > > I am not sure I understand, can you expand a little bit about what you > have in mind? I call it a step backwards because simplification is done at the cost of efficiency (individual heap allocations, changing the ownership model), when the original code avoided that. It's well known that standard C++ is missing basic string manipulation utilities, but that's no reason to force ourselves to consider std::string the ultimate tool, IMO. There's C++ the library, and then there's the C++ the language. We can't do anything about the latter, but we can extend the former. Later revisions of the standard are adding more facilities, like std::string_view in C++17, and now things like std::array_view / std::span (from gsl::span) have a very good chance of making their way into C++20. Many C++ projects end up reinventing something like those types, hence the attempts at standardization, making those types lingua franca for non-owning use cases. > > What I meant is that is changes things like this: > > strncpy (current_substring, percent_loc, length_before_ll); > strcpy (current_substring + length_before_ll, "I64"); > current_substring[length_before_ll + 3] = > percent_loc[length_before_ll + lcount]; > current_substring += length_before_ll + 4; > > into this > > piece_string.assign (percent_loc, length_before_ll); > piece_string += "I64"; > piece_string += percent_loc[length_before_ll + lcount]; > > Less magical number, less playing with offsets, etc. I meant that you don't need for each piece_string to own its own deep copy of the string to get rid of the magical numbers. You can get that even with a couple C-like free-functions, like: // memcpy, null-terminate, and return advanced pointer. char *append (char *p, const char *str); char *append (char *p, const char *str, size_t len); and then: char *p = current_substring; p = append (p, percent_loc, length_before_ll); p = append (p, "I64"); p = append (p, percent_loc[length_before_ll + lcount]); current_substring = p; You could wrap that in a class, like: /* Helper to build an null-terminated C string. */ struct zstring_builder { /* POS is a pointer to the position in buffer to work on. */ explicit zstring_builder (char *pos) : m_pos (pos) {} // memcpy, advance m_pos, null-terminate. zstring_builder &append (const char *s, size_type count); zstring_builder &append (const char* s); char *pos () { return m_pos; } private: char *m_pos; }; Or: /* Helper to build a null-terminated C string on top of a vector. */ struct zstring_builder { /* POS is a pointer to the position in buffer to work on. */ explicit zstring_builder (std::vector &buf, size_t pos); // memcpy, advance m_pos, null-terminate. zstring_builder &append (const char *s, size_type count); zstring_builder &append (const char* s); char *pos () { return &m_vector[m_pos]; } private: std::vector &m_buf; size_t m_pos; }; or something else along those lines, if you now make piece_string a zstring_builder, you can get similar, and just as readable code: zstring_builder piece_string (current_substring); piece_string.append (percent_loc, length_before_ll); piece_string.append ("I64"); piece_string.append (percent_loc[length_before_ll + lcount]); current_substring = piece_string.pos (); Alternatively, we could have something like a zstring_ref class [1], as a non-owning mutable view of a C string (that never reallocates), guaranteed to be null-terminated, and then the code you shown would be just exactly the same. Open choices could be whether to save the string size inside zstring_ref (like string_view), or not saving a size, making it a really-thin wrapper around a pointer instead. std::string is great as an _owning_ SSO-enabled (which is sometimes a curse when embedded in structures...) container of characters. But when we don't want an owning container, the right tool / refactor can still make the code just as readable. [1] - I've thought several times because that something like a zstring_view (like string_view, but guarantees null-termination) would be handy in a lot of places. string_view isn't mutable, though, hence the alternative name instead. could be mut_string_view too, for example. Thanks, Pedro Alves