Logging to stderr and corresponding character encoding

Question

Logging to stderr and corresponding character encoding

CSK 21

I am working on the following Windows apps:

Interactive: entry point - wWinMain
Non-interactive: entry point - wWinMain; A pure background app with no windows or a Windows service.
Command line app: entry point - wmain

As evident with the entry points, Unicode is the preferred encoding.

The application failure flows consists of two steps:

logging to stderr
Displaying error window.

In the display error window step, a small modal window is created and displayed to user. But this is a valid experience only for the interactive app. Non-interactive and command line app relies on logging to stderr.

The thought process behind logging to stderr is:
User launches the app as a Windows service, and it has failed to load a dll. The remedy step is to launch the same service executable from the terminal, but with stderr redirection, to a file or the terminal.
Windows command-line apps startup, execute work and shutdown and return control to the terminal. They primarily rely on parent console to communicate failure. A static buffer (defined as follows) contains the failure message to be displayed.

// The encoding of the string is UTF8.
char8_t DiagnosticLogMgr::sBuffer[1024] = {};

The encoding of the string stored in the buffer is UTF8 - this is an assumption and needs to be validated in this post.

My initial thought was, to log to stderr, I'd use the C++ fprintf method, as shown below:

fprintf (stderr, "%s\n", (const char *) sBuffer);

But fprintf doesn't interpret the encoding of the string. It simply copies the bytes to the internal stream. Ultimately, this content will be logged to an 'OS resource' - a terminal (command prompt or powershell) or a file.

The question is, how does that 'OS resource' interpret the string. My understanding was that Windows uses UTF16 encoding, but on research, this doesn't seem to be true across.

According to WriteConsole function, the default encoding of command prompts is not Unicode or ANSI, but something else, called OEM.

This function uses either Unicode characters or 8-bit characters from the console's current code page. The console's code page defaults initially to the system's OEM code page. To change the console's code page, use the SetConsoleCP or SetConsoleOutputCP functions.

So, my UTF8 encoded diagnostic log message is being interpreted in a different way, which most likely fails when the language is not English.

The documentation also states that I can explicitly set the encoding to UTF8 using SetConsole or SetConsoleOutputCP. So, something like this can work:

::SetConsoleOutputCP(CP_UTF8)

What is the recommended way to log to stderr? The approach described above - fprintf + ::SetConsoleOutputCP, is the recommended? Does this behaviour change for command prompt vs powershell. Also, stderr can also point to a file, in which case, the app that interprets the file should use the right encoding, UTF8 in this case.

Just wanted to understand and clarify the overall approach to logging to stderr w.r.t character encoding.

0 comments

Answer accepted by question author

Taki Ly (WICLOUD CORPORATION) 2,225 Microsoft External Staff Moderator

Hello @CSK ,

Thank you for sharing such a detailed description of your scenario. To better understand the behavior you described, I reproduced your setup using a static UTF-8 buffer and fprintf. During testing, I observed what you noted. Without adjusting the console, non-English characters in the buffer appear as mojibake on the screen. However, applying SetConsoleOutputCP(CP_UTF8) effectively resolves the console rendering, and redirecting the output via CMD perfectly preserves the raw UTF-8 bytes in a file.

Based on those observations and the platform's behavior, below are some thoughts regarding your specific questions:

1. Is fprintf + ::SetConsoleOutputCP considered a recommended approach?

Using fprintf alongside ::SetConsoleOutputCP(CP_UTF8) (or 65001) seems to be a practical way when working with the C runtime library. It signals the console host to interpret the incoming byte stream as UTF-8, without altering the raw bytes stored in your buffer.

As an alternative for modern environments (Windows 10 version 1903 and later), you might also explore setting the active code page to UTF-8 via the Application Manifest (<activeCodePage>UTF-8</activeCodePage>). This approach tells the C runtime and ANSI APIs to handle strings as UTF-8 process-wide, which might save you from manually managing console code pages.

References:
- SetConsoleOutputCP function
- Use UTF-8 code pages in Windows apps

2. Does this behavior change between Command Prompt and PowerShell?

For on-screen output: You likely won't see a difference. Both shells use the same underlying Windows Console Host, so adjusting the output code page typically helps both environments render the text correctly.
For file redirection (e.g., app.exe 2> error.log): There is a notable architectural difference. Command Prompt (cmd.exe) generally redirects the raw bytes directly, preserving your UTF-8 string perfectly. In contrast, the PowerShell pipeline might intercept the stream and transcode it according to its internal $OutputEncoding variable. If a user encounters corrupted text in a redirected log file from PowerShell, it is often a characteristic of the PowerShell host's transcoding rather than a flaw in your logging logic.
References:
- Console Code Pages
- Understanding file encoding in VS Code and PowerShell

3. Regarding stderr pointing to a file and the viewing app's encoding:

When stderr is redirected directly to a file by the OS (as in CMD), the console code page mechanics are bypassed. The file is populated with the raw memory bytes directly from your sBuffer. Because your buffer natively holds UTF-8 data, the resulting text file inherently becomes a valid UTF-8 document. Modern text editors should naturally open and interpret the file correctly without any additional conversion steps.

I hope these observations and references are helpful as you evaluate the best fit for your application. If you found my response helpful or informative, I would greatly appreciate it if you could follow this guide for your confirmation.

Thank you.

0 comments

1 additional answer

Your answer

Answer 1

For Windows console and stderr logging, the key points from the platform behavior are:

Console encoding is a code page, not inherently UTF-16.
A console has separate input and output code pages. By default, the console output code page is the system OEM code page, not Unicode or UTF-8.
- Each console has an input and an output code page.
- The output code page is used to translate character values written by output functions into glyphs in the console window.
- The default is the system OEM code page.
- The output code page can be changed with SetConsoleOutputCP.
fprintf(stderr, ...) writes bytes; interpretation depends on the target.
fprintf does not interpret encoding; it just writes bytes to the C runtime stream. What those bytes mean depends on where stderr is connected:
- If stderr is attached to a console, the console interprets bytes according to its current output code page.
- If stderr is redirected to a file or pipe, no interpretation is done at write time; the bytes are just stored. Any later reader must know/assume the encoding.
Console output code page and UTF-8.
The console’s output code page can be set to UTF-8 (code page 65001) so that 8‑bit output APIs interpret bytes as UTF‑8:
- The console uses its output code page to translate character values written by output functions into images in the console window.
- To change the console’s output code page, use SetConsoleOutputCP.
- For UTF‑8, set SetConsoleOutputCP(65001).
This aligns with the guidance that UTF‑8 is the universal code page for internationalization and is recommended for new apps.
Recommended pattern for UTF‑8 stderr to a console:
- Use UTF‑8 for the diagnostic buffer.
- Before writing UTF‑8 bytes to a console via fprintf(stderr, ...), ensure the console output code page is set to UTF‑8:
```
     SetConsoleOutputCP(65001); // CP_UTF8
     fprintf(stderr, "%s\n", reinterpret_cast<const char*>(sBuffer));
```
- This ensures that when stderr is attached to a console, the console interprets the bytes as UTF‑8.
Note: this affects the console process-wide; other processes sharing that console will see the changed code page.
Behavior when stderr is redirected to a file.
When stderr is redirected to a file, the console code page is not involved in writing. The file will contain exactly the bytes written by fprintf. To make this usable:
- Choose a single encoding for your log output (UTF‑8 is recommended).
- Document that the log file is UTF‑8 so that tools/editors can open it correctly.
Command prompt vs PowerShell.
Both ultimately rely on the console host and its code page for rendering. The console’s output code page rules apply regardless of whether the parent shell is Command Prompt or PowerShell. The key determinant is the console’s output code page, not the shell.
Alternative: use Unicode console APIs instead of fprintf for console-only output.
If writing directly to the console (not via redirection) and using UTF‑16 internally, the Unicode console APIs can be used:
- Console APIs such as ReadConsoleOutput and related functions can use Unicode characters.
- The console’s code page still matters when using 8‑bit APIs; for Unicode (W) APIs, UTF‑16 is used.
However, this does not help when stderr is redirected, because redirection operates at the byte-stream level.
Process-wide UTF‑8 for legacy APIs (optional).
For packaged or manifest-based apps on Windows 10 1903 and later, the process code page can be set to UTF‑8 via the activeCodePage manifest property. This makes the process use UTF‑8 as its code page for many ANSI APIs, reducing the need for explicit code page management.
- UTF‑8 is recommended to maximize compatibility and minimize localization bugs.

Summary of a recommended approach for stderr logging:

Use UTF‑8 as the internal encoding for diagnostic messages.
When writing to stderr with fprintf and expecting output on a console:
- Ensure the console output code page is set to UTF‑8 using SetConsoleOutputCP(65001) before writing.
When stderr is redirected to a file:
- The file will contain UTF‑8 bytes; document this so that downstream tools interpret it correctly.
For interactive error dialogs, continue using Unicode (UTF‑16) APIs for UI.

This approach is consistent with the platform guidance to avoid legacy code pages and prefer Unicode/UTF‑8 for new and updated command-line applications.

References:

Share via

Logging to stderr and corresponding character encoding

1 additional answer

Your answer