Share via


UTF-8 output using C++ locales

Question

Thursday, December 2, 2010 7:28 PM

I'm trying to modify the following code so it work also on Windows, I have been told that the equivalent of en_US.UTF-8 is english_us.65001 but 65001 won't work with C++ locales. So what is the truth?

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main()
{
    locale system("");
    locale::global(system);

    wcin.imbue(system);

    wstring data;
    getline(wcin,data);

    wcout.imbue(system);
    wcout << data << L" length=" << data.length() << endl;

    locale utfFile("en_US.UTF-8");
    wofstream file("my_utf_file.txt");
    file.imbue(utfFile);

    file << data;
    file << endl;

    file.close();

    return 0;
}

All replies (6)

Thursday, December 2, 2010 9:48 PM âś…Answered

VC++ doesn't support creating UTF-8 locales. If you want to use standard streams, you'll have to use a 3rd party UTF-* facet (as I recall Dinkumware sells a set of facets separately from their other libraries) or pre-transcode your data into UTF-8 using e.g. wcstombs then write it to a stream as binary data.


Thursday, December 2, 2010 7:33 PM

Use a std::codecvt_utf8 or std::codecvt_utf8_utf16 facet instead of a locale.


Thursday, December 2, 2010 7:38 PM

Unfortunately I'm currently still stuck with C++98.


Saturday, December 4, 2010 11:01 AM

VC++ doesn't support creating UTF-8 locales. If you want to use standard streams, you'll have to use a 3rd party UTF-* facet (as I recall Dinkumware sells a set of facets separately from their other libraries) or pre-transcode your data into UTF-8 using e.g. wcstombs then write it to a stream as binary data.

... or as text using ofstream.
 
I never use wofstream because I do not trust it (or maybe just don't understand it). For me doing the UTF-16 to UTF-8 conversion myself is the way to go. I have written my own CU2W and CW2U classes (based on CA2W and CW2A) to do the conversion.
 
David Wilkinson | Visual C++ MVP


Saturday, December 4, 2010 11:11 AM

Thanks for the confirmation. I will have to look into the C++0x stuff and check its availability throughout the compilers I use.


Saturday, December 11, 2010 10:07 AM

I'm trying to modify the following code so it work also on Windows, I have

been

told that the equivalent of en_US.UTF-8 is english_us.65001 but 65001 won't work with C++ locales. So what is the truth?

You can explicitely open the file as UTF-8:
     fopen("my_utf_file.txt", "rw, ccs=UTF-8");
http://msdn.microsoft.com/en-us/library/yeby3zcb%28v=vs.80%29.aspx

Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net

Replace year with _ to get the real email