forked from lwg/issues
-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Automatic update from GitHub Actions workflow
- Loading branch information
github-actions
committed
Apr 19, 2024
1 parent
8bef771
commit 5f29a5a
Showing
22 changed files
with
720 additions
and
223 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,179 @@ | ||
<!DOCTYPE html> | ||
<html lang="en"> | ||
<head> | ||
<meta charset="utf-8"> | ||
<title>Issue 4070: Transcoding by std::formatter<std::filesystem::path></title> | ||
<meta property="og:title" content="Issue 4070: Transcoding by std::formatter<std::filesystem::path>"> | ||
<meta property="og:description" content="C++ library issue. Status: New"> | ||
<meta property="og:url" content="https://cplusplus.github.io/LWG/issue4070.html"> | ||
<meta property="og:type" content="website"> | ||
<meta property="og:image" content="https://isocpp.org/assets/images/cpp_logo.png"> | ||
<meta property="og:image:alt" content="C++ logo"> | ||
<style> | ||
p {text-align:justify} | ||
li {text-align:justify} | ||
pre code.backtick::before { content: "`" } | ||
pre code.backtick::after { content: "`" } | ||
blockquote.note | ||
{ | ||
background-color:#E0E0E0; | ||
padding-left: 15px; | ||
padding-right: 15px; | ||
padding-top: 1px; | ||
padding-bottom: 1px; | ||
} | ||
ins {background-color:#A0FFA0} | ||
del {background-color:#FFA0A0} | ||
table.issues-index { border: 1px solid; border-collapse: collapse; } | ||
table.issues-index th { text-align: center; padding: 4px; border: 1px solid; } | ||
table.issues-index td { padding: 4px; border: 1px solid; } | ||
table.issues-index td:nth-child(1) { text-align: right; } | ||
table.issues-index td:nth-child(2) { text-align: left; } | ||
table.issues-index td:nth-child(3) { text-align: left; } | ||
table.issues-index td:nth-child(4) { text-align: left; } | ||
table.issues-index td:nth-child(5) { text-align: center; } | ||
table.issues-index td:nth-child(6) { text-align: center; } | ||
table.issues-index td:nth-child(7) { text-align: left; } | ||
table.issues-index td:nth-child(5) span.no-pr { color: red; } | ||
@media (prefers-color-scheme: dark) { | ||
html { | ||
color: #ddd; | ||
background-color: black; | ||
} | ||
ins { | ||
background-color: #225522 | ||
} | ||
del { | ||
background-color: #662222 | ||
} | ||
a { | ||
color: #6af | ||
} | ||
a:visited { | ||
color: #6af | ||
} | ||
blockquote.note | ||
{ | ||
background-color: rgba(255, 255, 255, .10) | ||
} | ||
} | ||
</style> | ||
</head> | ||
<body> | ||
<hr> | ||
<p><em>This page is a snapshot from the LWG issues list, see the <a href="lwg-active.html">Library Active Issues List</a> for more information and the meaning of <a href="lwg-active.html#New">New</a> status.</em></p> | ||
<h3 id="4070"><a href="lwg-active.html#4070">4070</a>. Transcoding by <code>std::formatter<std::filesystem::path></code></h3> | ||
<p><b>Section:</b> 99 [fs.path.fmtr.funcs] <b>Status:</b> <a href="lwg-active.html#New">New</a> | ||
<b>Submitter:</b> Jonathan Wakely <b>Opened:</b> 2024-04-19 <b>Last modified:</b> 2024-04-19</p> | ||
<p><b>Priority: </b>Not Prioritized | ||
</p> | ||
<p><b>View all issues with</b> <a href="lwg-status.html#New">New</a> status.</p> | ||
<p><b>Discussion:</b></p> | ||
<p> | ||
99 [fs.path.fmtr.funcs] says: | ||
|
||
<blockquote> | ||
If <code class='backtick'>charT</code> is <code class='backtick'>char</code>, <code class='backtick'>path::value_type</code> is <code class='backtick'>wchar_t</code>, | ||
and the literal encoding is UTF-8, then the escaped path is | ||
transcoded from the native encoding for wide character strings to UTF-8 | ||
with maximal subparts of ill-formed subsequences substituted with | ||
<span style="font-variant:small-caps">u+fffd</span> | ||
replacement character per the Unicode Standard [...]. | ||
Otherwise, transcoding is implementation-defined. | ||
</blockquote> | ||
</p> | ||
|
||
<p> | ||
This seems to mean that the Unicode substitutions are only done | ||
for an escaped path, i.e. when the <code class='backtick'>?</code> option is used. Otherwise, the form | ||
of transcoding is completely implementation-defined. | ||
However, this makes no sense. | ||
An escaped string will have no ill-formed subsequences, because they will | ||
already have been replaced as per 22.14.6.4 <a href="https://wg21.link/format.string.escaped">[format.string.escaped]</a>: | ||
<blockquote> | ||
Otherwise (<em>X</em> is a sequence of ill-formed code units), | ||
each code unit <em>U</em> is appended to <em>E</em> in order as | ||
the sequence <code>\x{<em>hex-digit-sequence</em>}</code>, | ||
where <code><em>hex-digit-sequence</em></code> is the shortest hexadecimal | ||
representation of <em>U</em> using lower-case hexadecimal digits. | ||
</blockquote> | ||
</p> | ||
<p> | ||
So only unescaped strings can have ill-formed sequences by the time | ||
we do transcoding to <code class='backtick'>char</code>, but whether or not any | ||
<span style="font-variant:small-caps">u+fffd</span> substitution | ||
occurs is just implementation-defined. | ||
</p> | ||
|
||
<p> | ||
I believe we want to specify the substitutions are done when transcoding | ||
an <em>unescaped</em> path (and it doesn't matter whether we specify it | ||
for escaped paths, because it's a no-op if escaping happens first, | ||
as is apparently intended). | ||
</p> | ||
|
||
<p> | ||
It does matter whether we escape first or perform substitutions first. | ||
If we escape first then every code unit in an ill-formed sequence is | ||
individually escaped as <code class='backtick'>\x{hex-digit-sequence}</code>. | ||
So an ill-formed sequence of two <code class='backtick'>wchar_t</code> values will be escaped as | ||
two <code class='backtick'>\x{...}</code> strings, which are then transcoded to UTF-8. | ||
If we transcode (with substitutions first) then the entire | ||
ill-formed sequence is replaced with a single replacement character, | ||
which will then be escaped as <code class='backtick'>\x{fffd}</code>. | ||
SG16 should be asked to confirm that escaping first is intended, | ||
so that an escaped string shows the original invalid code units. | ||
For a non-escaped string, we want the ill-formed sequence to be | ||
formatted as �, which the proposed resolution tries to ensure. | ||
</p> | ||
|
||
|
||
|
||
<p id="res-4070"><b>Proposed resolution:</b></p> | ||
<p> | ||
This wording is relative to <a href="https://wg21.link/N4981">N4981</a>. | ||
</p> | ||
<ol> | ||
<li><p>Modify 99 [fs.path.fmtr.funcs] as indicated:</p> | ||
|
||
<blockquote> | ||
<pre><code> | ||
template<class FormatContext> | ||
typename FormatContext::iterator | ||
format(const filesystem::path& p, FormatContext& ctx) const; | ||
</code></pre> | ||
<blockquote>-5- | ||
<em>Effects</em>: | ||
Let <code class='backtick'>s</code> be <code>p.generic_string<filesystem::path::value_type>()</code> | ||
if the <code class='backtick'>g</code> option is used, otherwise <code class='backtick'>p.native()</code>. | ||
Writes <code class='backtick'>s into </code>ctx.out()`, adjusted according to the path-format-spec. | ||
If <code class='backtick'>charT</code> is <code class='backtick'>char</code>, <code class='backtick'>path::value_type</code> is <code class='backtick'>wchar_t</code>, | ||
and the literal encoding is UTF-8, then the | ||
<del>escaped path</del> | ||
<ins>(possible escaped) string</ins> | ||
is transcoded from the native encoding for wide character strings to UTF-8 | ||
with maximal subparts of ill-formed subsequences substituted with | ||
<span style="font-variant:small-caps">u+fffd</span> replacement character per | ||
the Unicode Standard, Chapter 3.9 <span style="font-variant:small-caps">u+fffd</span> | ||
Substitution in Conversion. | ||
If <code class='backtick'>charT</code> and <code class='backtick'>path::value_type</code> are the same then no transcoding is performed. | ||
Otherwise, transcoding is implementation-defined | ||
</blockquote> | ||
</blockquote> | ||
</li> | ||
<li> | ||
Modify the entry in the index of implementation-defined behavior as indicated: | ||
<blockquote> | ||
transcoding of a formatted <code class='backtick'>path</code> when <code class='backtick'>charT</code> and <code class='backtick'>path::value_type</code> differ | ||
<ins>and not converting from <code class='backtick'>wchar_t</code> to UTF-8</ins> | ||
</blockquote> | ||
</li> | ||
|
||
</ol> | ||
|
||
|
||
|
||
|
||
|
||
</body> | ||
</html> |
Oops, something went wrong.