From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id OMOFLO0iXWg9CCAAWB0awg (envelope-from ) for ; Thu, 26 Jun 2025 06:37:33 -0400 Received: by simark.ca (Postfix, from userid 112) id AA1841E11E; Thu, 26 Jun 2025 06:37:33 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-9.0 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_00, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_VALIDITY_CERTIFIED, RCVD_IN_VALIDITY_RPBL,RCVD_IN_VALIDITY_SAFE autolearn=ham autolearn_force=no version=4.0.1 Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id E20971E089 for ; Thu, 26 Jun 2025 06:37:32 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 64333385B805 for ; Thu, 26 Jun 2025 10:37:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 64333385B805 Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com [209.85.128.54]) by sourceware.org (Postfix) with ESMTPS id 79E89385B830 for ; Thu, 26 Jun 2025 10:35:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 79E89385B830 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=palves.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 79E89385B830 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=209.85.128.54 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1750934147; cv=none; b=EuSVZTlMfBNwA+MbdCS9quJwBg6NZydhPpW2iCPMCfFqBTIa2vD1NC9BH3xrNLiOhxV61DaF+qY2bqo49WztkLT9msVBVFTDtk6U+TITAP9arepYIByyhsYgMt6aTl5ADkbOmuSCusXvjdEflHr9fl158h4Qc6iKj+W8+t3WBrQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1750934147; c=relaxed/simple; bh=i8spx8hPNSevrnIEjXdGufs45pvRulXRPOxzaFdP79Q=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=Jvs71FDCevEJViAZzsp2gZ5doiTJ/v6EdWoPTv0zhW3CkxYXifXQtCO6zOXOFyFNWQmF7ohbt2GE0KD1tQfwnmXkBUtloeT3fhZzeignjEDGmZ0Yc/SWPdJhU3EEIjk6c0LRrudj/POMlEg1P6YSSo1wW6K+bmRWG7MC0IcjMeQ= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 79E89385B830 Received: by mail-wm1-f54.google.com with SMTP id 5b1f17b1804b1-450cfb79177so4377335e9.0 for ; Thu, 26 Jun 2025 03:35:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750934146; x=1751538946; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JEbSqsGYnrho9Kj6a0TRweHlTuOQ7mRky60vIcUP4Ys=; b=Efu2p5gcn4SUicqMK4oMGBU+VPlciCwh6P2H53lNQ+mufz1b3nN5EOZ2rOjCZ2Vb0T e6NL5CAmYPdA7OUxrmypfuVBSGc6jTPkqLHzoH7koVM6O/Bx6gT6+CUr/Uaxca4MtaF5 HltDFasEfVXTfzkaMqo3TtbTtDnz+/xKdgdVDIF//sjnJOVcYkpPjkUjDNWgSmzGJgDm Wox7CfCXA9Z5GMTt9XL/FirWVOhMDyBdhhk5eTySI0aJZfXZKrdKqeMyQ4OISIvrEtnN ejhyhyoA0hZ5SnzJevvBHn7giMsCugHFBewijGPR/6LQp62My/ynUEAdQNQjSv8wT/HR epEQ== X-Forwarded-Encrypted: i=1; AJvYcCWAxZtA3QLU+blahb15j+KLgcFFXpRnwyqARiqupRSsqawdDVCjQdbP/HW8NM/PpuJgtKqEiHYXfLW1pQ==@sourceware.org X-Gm-Message-State: AOJu0YxgnbeshPGTZ4rZ+2T8eqQ7kmc7i3AKSYoOc/4sGlfVDdIMWAX9 RQmMP8+eaawFWQPVBUUzENEgbfRadIqBO4UufK/IT9/GgVeb30mKCNy9 X-Gm-Gg: ASbGncs/8OpARr+zNYcjvUe2cohTiYGHgfx7bQ8UIK1jeRkNXksmfJLUUsF/dvEXFFf /92XVehGzm6zjmnfQGNuLkm4nfLFBCf0huxSdxiOHA6bE8v7vvpfvQjA3ouDs3+tVdaKPcpuRdw D79qk3lVcQZSbN+7zyDkZI5nR52cDiKYkfH4Hpcp3LzVoIklF/ofVYqovYmUfwqVMr6bAMJSDzR E2iZvTxZF+fXSCK81zndqhSyPJ+nAbsSPGPS6PAhHhpbxDli/iNSTYYxeKoWnsOarvygTsN8gtE hOKfLqbgjmXpxObQdvKDF63lk+wnqfJtViRn/A158tKdDJZG9t3p3gazBne0dWgHQ7zfLS26ekN eASSAPkJQkfXb1IkziTbKhmrLJBG/0fFb3pCBafbX X-Google-Smtp-Source: AGHT+IGJWEI+y28XcfO68WXScsNSvKTDlT5tX4UtmPFzYKNbuYhAFNVc+s4e0UISLe5dQeCvFBL11Q== X-Received: by 2002:a05:600c:4690:b0:43c:eeee:b70a with SMTP id 5b1f17b1804b1-4538a4064d5mr16864485e9.22.1750934146029; Thu, 26 Jun 2025 03:35:46 -0700 (PDT) Received: from ?IPV6:2001:8a0:fac3:6d00:6268:8a46:85b3:6170? ([2001:8a0:fac3:6d00:6268:8a46:85b3:6170]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4538a3a5fd2sm15589835e9.15.2025.06.26.03.35.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 Jun 2025 03:35:45 -0700 (PDT) Message-ID: <2ea323f9-d606-4bc9-b4db-995f980d85c7@palves.net> Date: Thu, 26 Jun 2025 11:35:37 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/2] Allow check-mark to be changed for CLI To: Eli Zaretskii Cc: tromey@adacore.com, aburgess@redhat.com, gdb-patches@sourceware.org References: <20250509-emoji-check-mark-v1-0-63b6c52411f3@adacore.com> <20250509-emoji-check-mark-v1-2-63b6c52411f3@adacore.com> <87zfffnqrj.fsf@redhat.com> <87wmag63jg.fsf@tromey.com> <86ikm0y1hi.fsf@gnu.org> <86o6ubcapd.fsf@gnu.org> From: Pedro Alves Content-Language: en-US In-Reply-To: <86o6ubcapd.fsf@gnu.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces~public-inbox=simark.ca@sourceware.org On 2025-06-26 06:51, Eli Zaretskii wrote: >> Date: Wed, 25 Jun 2025 20:11:07 +0100 >> Cc: aburgess@redhat.com, gdb-patches@sourceware.org >> From: Pedro Alves >> >>>> I don't think there's any way to figure out what the display width of a >>>> string might be. At least, not unless gdb adds a dependency on >>>> something like libicu. >>> >>> I don't think even libicu will be enough, because the terminal >>> emulators vary in this respect. >>> >>> There's some recent protocol to help with this, but I don't think it's >>> supported widely enough for GDB to rely on it. >> >> I'm curious on what protocol is this, and what it addresses. > > That's "Mode 2027". For the details, see > > https://github.com/contour-terminal/terminal-unicode-core > and > https://mitchellh.com/writing/grapheme-clusters-in-terminals > Thank you, that's quite a good read. As the blog explains, "historically terminals used wcwidth, shells, editors, and other TUI apps also use wcwidth, and continue to do so today. Even though it produces the wrong values for multi-codepoint graphemes, at least the wrong values are often consistently wrong across terminal emulators." So what can go wrong is grapheme clustering. Both the terminal and the app must agree on clustering for that to work properly. We could implement that protocol in GDB today, as there's a way to query if the terminal supports it. But I'm not proposing that. So if we ignore multi-codepoint graphemes, then wcwidth always matches what the terminals do. >> If we assume monospace font in the terminal, what goes wrong? > > Monospace font is only relevant for single-character strings. Once > you try to display a string that has more than one Unicode character, > a terminal may or may not combine those characters into one or more > font glyphs (it's a many-to-many relation, so you could have M > characters displayed as N glyphs, where M could be lass than, equal, > or greater than N, depending on the characters in the string and the > capabilities of the font used by the terminal). If that happens, > wcswidth will usually produce an incorrect result, because it doesn't > (and cannot) know about these features of the terminal, and basically > just sums the results of wcwidth of each character in the string. > >> Is there a downside to using the number of columns wcwidth says the character occupies, >> and default to 1 if wcwidth returns -1? > > See above. This discussion started in the context of showing Emoji, > where the case of combining several characters into one or more glyphs > happens very frequently. For example, the sequence U+2713 followed by > U+FE0F VARIATION SELECTOR-16 should display a single glyph on capable > terminals (try it in a GUI session in Emacs to see that), although > it's two characters. In some cases, wcswidth will do the job, because > control characters like U+FE0F have zero width in its database, but > that is not universally so. OK, but we can just avoid picking emojis that are encoded with variation selector for the defaults where the gryphs would be used in a context where alignment matters, and document that picking such an emoji in your custom style may misbehave. > For example, the following string of > characters: > > U+1F468 U+200D U+1F469 U+200D U+1F466 > > will be displayed by capable terminals as a single glyph (again, try > in Emacs to see), but wcswidth will probably return 6 for it. > Note that for the use case we're talking about, the character used as current-line indicator, it's always going to be " ", with spaces or tabs or endline surrounding it, there is no risk of the character we choose bumping into other characters we don't control and forming a cluster sequence. If itself is not a cluster, and is a single (maybe wide) character, then wcwidth returns the right thing. IOW, if we do use wcwidth, then it's going to be right for non-cluster multi-cell characters, which is a large pool of reasonable emojis that we can choose from for the use case at hand. I'd say that we could even consider letting the user pick a multi-character sequence for the marker, and we would wcswidth instead of wcwidth. Then, if GDB and the terminal disagree on grapheme width, well, the downside is that the user gets their tables misaligned, but that is no worse from what happens if we _don't_ consider wcwidth at all. But not going there, and only allowing single character emojis, and using wcwidth, is already going to do the right thing and be quite useful. > And even for single-character strings wcwidth will sometimes produce > incorrect results, because the fonts used by modern terminal emulators > for some exotic characters (like Emoji) are not monospaced. > As long as we don't pick such a character as the default one, it's going to be on the user not to pick one. The blog post you pointed at makes me believe that we _should_ indeed use wcwidth. If we work under the single-character constraint, then I don't really see a downside to using wcwidth. For the wcwidth returns -1 scenario, I see two choices: - assume width of one cell, and print marker from style anyhow, or, - fallback to hardcoded "*" character, like today's output. Pedro Alves