From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 78306 invoked by alias); 24 Nov 2017 16:26:16 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 77597 invoked by uid 89); 24 Nov 2017 16:26:15 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,KB_WAM_FROM_NAME_SINGLEWORD,SPF_HELO_PASS,SPF_PASS,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=bonjour, owning, Unless, monde X-HELO: simark.ca Received: from simark.ca (HELO simark.ca) (158.69.221.121) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 24 Nov 2017 16:26:14 +0000 Received: by simark.ca (Postfix, from userid 112) id A02651E586; Fri, 24 Nov 2017 11:26:11 -0500 (EST) Received: from simark.ca (localhost [127.0.0.1]) by simark.ca (Postfix) with ESMTP id CFF6A1E030; Fri, 24 Nov 2017 11:26:09 -0500 (EST) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Fri, 24 Nov 2017 16:26:00 -0000 From: Simon Marchi To: Pedro Alves Cc: Simon Marchi , Tom Tromey , gdb-patches@sourceware.org Subject: Re: [RFA] C++-ify parse_format_string In-Reply-To: References: <20171123164631.11055-1-tom@tromey.com> <0350cb5f-d176-76c4-11e8-5ffb3fe8c84d@redhat.com> Message-ID: X-Sender: simark@simark.ca User-Agent: Roundcube Webmail/1.3.2 X-SW-Source: 2017-11/txt/msg00616.txt.bz2 On 2017-11-24 07:54, Pedro Alves wrote: > On 11/24/2017 03:17 AM, Simon Marchi wrote: >> On 2017-11-23 05:40 PM, Pedro Alves wrote: >>> On 11/23/2017 09:13 PM, Simon Marchi wrote: >>> >>>> I have a patch in some branch that does essentially the same thing, >>>> so >>>> I was able to compare our approaches. In my version, I removed the >>>> big allocation that is shared among pieces, and made each piece have >>>> its own std::string. Unless we want to keep the current allocation >>>> scheme for performance/memory usage reasons, I think that using >>>> std::strings simplifies things in the parse_format_string function. >>>> The format_pieces structure is replaced with an std::vector of >>>> format_piece. >>> >>> Sounds like a step backwards to me. If it simplifies things, then >>> it sounds like it might be because we're missing some utility, >>> like string_view or something like that. >> >> I am not sure I understand, can you expand a little bit about what you >> have in mind? > > I call it a step backwards because simplification is done at the > cost of efficiency (individual heap allocations, changing the > ownership model), when the original code avoided that. It's well known > that > standard C++ is missing basic string manipulation utilities, but that's > no reason to force ourselves to consider std::string the ultimate tool, > IMO. > There's C++ the library, and then there's the C++ the language. We > can't > do anything about the latter, but we can extend the former. Later > revisions > of the standard are adding more facilities, like std::string_view in > C++17, and > now things like std::array_view / std::span (from gsl::span) have a > very > good chance of making their way into C++20. Many C++ projects end up > reinventing something like those types, hence the attempts at > standardization, making those types lingua franca for non-owning use > cases. Thanks for the detailed reply! In my original reply, I said "Unless we want to keep the current allocation scheme for performance/memory usage reasons", removing the common allocation was done consciously. My thought was that the single allocation scheme had not been chosen for performance reason, but for simplicity. Computing how much to malloc for every piece would have been very annoying. Making each piece own its string made the code easier to reason about, but if the performance hit is not acceptable, I understand. >> What I meant is that is changes things like this: >> >> strncpy (current_substring, percent_loc, length_before_ll); >> strcpy (current_substring + length_before_ll, "I64"); >> current_substring[length_before_ll + 3] = >> percent_loc[length_before_ll + lcount]; >> current_substring += length_before_ll + 4; >> >> into this >> >> piece_string.assign (percent_loc, length_before_ll); >> piece_string += "I64"; >> piece_string += percent_loc[length_before_ll + lcount]; >> >> Less magical number, less playing with offsets, etc. > > I meant that you don't need for each piece_string to own its > own deep copy of the string to get rid of the magical numbers. > You can get that even with a couple C-like free-functions, like: > > // memcpy, null-terminate, and return advanced pointer. > char *append (char *p, const char *str); > char *append (char *p, const char *str, size_t len); > > and then: > > char *p = current_substring; > > p = append (p, percent_loc, length_before_ll); > p = append (p, "I64"); > p = append (p, percent_loc[length_before_ll + lcount]); > > current_substring = p; > > You could wrap that in a class, like: > > /* Helper to build an null-terminated C string. */ > struct zstring_builder > { > /* POS is a pointer to the position in buffer to work on. */ > explicit zstring_builder (char *pos) > : m_pos (pos) > {} > > // memcpy, advance m_pos, null-terminate. > zstring_builder &append (const char *s, size_type count); > zstring_builder &append (const char* s); > > char *pos () { return m_pos; } > > private: > char *m_pos; > }; > > Or: > > /* Helper to build a null-terminated C string on top > of a vector. */ > struct zstring_builder > { > /* POS is a pointer to the position in buffer to work on. */ > explicit zstring_builder (std::vector &buf, size_t pos); > > // memcpy, advance m_pos, null-terminate. > zstring_builder &append (const char *s, size_type count); > zstring_builder &append (const char* s); > > char *pos () { return &m_vector[m_pos]; } > > private: > std::vector &m_buf; > size_t m_pos; > }; I agree that something like this is good if we want to keep the current allocation scheme but make the code clearer. Since "append" continues the same string piece, I think we would need some kind of "finish" method (a bit like obstack) to be able to start a new string: zstring_builder builder (buf); builder.append ("Hello "); builder.append ("world!"); const char *english = builder.finish (); builder.append ("Bonjour "); builder.append ("monde!"); const char *french = builder.finish (); giving something like this: Hello world!\0Bonjour monde!\0 > or something else along those lines, if you now make piece_string > a zstring_builder, you can get similar, and just as readable code: > > zstring_builder piece_string (current_substring); > > piece_string.append (percent_loc, length_before_ll); > piece_string.append ("I64"); > piece_string.append (percent_loc[length_before_ll + lcount]); > > current_substring = piece_string.pos (); > > Alternatively, we could have something like a zstring_ref > class [1], as a non-owning mutable view of a C string (that never > reallocates), guaranteed to be null-terminated, and then the code > you shown would be just exactly the same. Open choices could be > whether > to save the string size inside zstring_ref (like string_view), or > not saving a size, making it a really-thin wrapper around a > pointer instead. > > std::string is great as an _owning_ SSO-enabled (which is sometimes a > curse when embedded in structures...) container of characters. But > when we don't want an owning container, the right tool / refactor can > still make the code just as readable. > > [1] - I've thought several times because that something like a > zstring_view > (like string_view, but guarantees null-termination) would be handy in > a lot of places. string_view isn't mutable, though, hence the > alternative name instead. could be mut_string_view too, for example. It's all performance/complexity tradeoffs, as usual :) Simon