Hello @CSK ,
Thank you for sharing such a detailed description of your scenario. To better understand the behavior you described, I reproduced your setup using a static UTF-8 buffer and fprintf. During testing, I observed what you noted. Without adjusting the console, non-English characters in the buffer appear as mojibake on the screen. However, applying SetConsoleOutputCP(CP_UTF8) effectively resolves the console rendering, and redirecting the output via CMD perfectly preserves the raw UTF-8 bytes in a file.
Based on those observations and the platform's behavior, below are some thoughts regarding your specific questions:
1. Is fprintf + ::SetConsoleOutputCP considered a recommended approach?
Using fprintf alongside ::SetConsoleOutputCP(CP_UTF8) (or 65001) seems to be a practical way when working with the C runtime library. It signals the console host to interpret the incoming byte stream as UTF-8, without altering the raw bytes stored in your buffer.
As an alternative for modern environments (Windows 10 version 1903 and later), you might also explore setting the active code page to UTF-8 via the Application Manifest (<activeCodePage>UTF-8</activeCodePage>). This approach tells the C runtime and ANSI APIs to handle strings as UTF-8 process-wide, which might save you from manually managing console code pages.
2. Does this behavior change between Command Prompt and PowerShell?
- For on-screen output: You likely won't see a difference. Both shells use the same underlying Windows Console Host, so adjusting the output code page typically helps both environments render the text correctly.
- For file redirection (e.g.,
app.exe 2> error.log): There is a notable architectural difference. Command Prompt (cmd.exe) generally redirects the raw bytes directly, preserving your UTF-8 string perfectly. In contrast, the PowerShell pipeline might intercept the stream and transcode it according to its internal$OutputEncodingvariable. If a user encounters corrupted text in a redirected log file from PowerShell, it is often a characteristic of the PowerShell host's transcoding rather than a flaw in your logging logic. - References:
3. Regarding stderr pointing to a file and the viewing app's encoding:
When stderr is redirected directly to a file by the OS (as in CMD), the console code page mechanics are bypassed. The file is populated with the raw memory bytes directly from your sBuffer. Because your buffer natively holds UTF-8 data, the resulting text file inherently becomes a valid UTF-8 document. Modern text editors should naturally open and interpret the file correctly without any additional conversion steps.
I hope these observations and references are helpful as you evaluate the best fit for your application. If you found my response helpful or informative, I would greatly appreciate it if you could follow this guide for your confirmation.
Thank you.