Share via


How to set utf-8 for XmlTextWriter?

Question

Saturday, December 13, 2008 1:44 AM | 1 vote

TextWriter output = new StringWriter(); 
XmlTextWriter writer = new XmlTextWriter(output); 
writer.Formatting = Formatting.Indented; 
writer.WriteStartDocument(); 

WriteStartDocument() create a string "<?xml version="1.0" encoding="utf-16"?>"

How to set utf8?

All replies (8)

Wednesday, December 17, 2008 1:53 AM ✅Answered | 4 votes

I just copy and pasted your code, added the "s" to "tring", and ran it. I did not get the BOM.

However, I changed the line "string result = Encoding.UTF8.GetString(output.ToArray());" to "string result = Encoding.Default.GetString(output.ToArray());" and then I could see the BOM.

You can prevent the BOM from being generated by constructing your own instance of the Encoding class instead of using the default UTF8-with-BOM:

string result;  
Encoding utf8noBOM = new UTF8Encoding(false);  
XmlWriterSettings settings = new XmlWriterSettings();  
settings.Indent = true;  
settings.Encoding = utf8noBOM;  
using (MemoryStream output = new MemoryStream())  
{  
    using (XmlWriter writer = XmlWriter.Create(output, settings))  
    {  
        writer.WriteStartDocument();  
        writer.WriteStartElement("Colors");  
        writer.WriteElementString("Color", "RED");  
        writer.WriteEndDocument();  
    }  
    result = Encoding.Default.GetString(output.ToArray());  
}  
Console.WriteLine(result);   
 

You should no longer get the BOM at all then.

       -Steve


Saturday, December 13, 2008 6:30 AM | 4 votes

The XmlWriter classes will override any encodings you try to set if the underlying stream has a required encoding.

Strings in .NET are UTF-16; hence, writing XML to a string will be forced to UTF-16. Note that if you write to a file, you can select the encoding.

However, you can force it to write in UTF-8 to a string by writing it to a (binary) memory array (the XmlWriter will use its default UTF-8) and then parsing that into a string, interpreting it as UTF-8:

string result;  
XmlWriterSettings settings = new XmlWriterSettings();  
settings.Indent = true;  
using (MemoryStream output = new MemoryStream())  
{  
    using (XmlWriter writer = XmlWriter.Create(output, settings))  
    {  
        writer.WriteStartDocument();  
    }  
    result = Encoding.UTF8.GetString(output.ToArray());  
}  
 

        -Steve


Monday, December 15, 2008 5:39 PM

Thanks.

String result = Encoding.UTF8.GetString(output.ToArray()); 

Creates a string:

<?xml version="1.0" encoding="utf-8"?>

with "" symbols (Hex: EF BB BF).


Monday, December 15, 2008 6:29 PM

That's the UTF-8 BOM. I wasn't getting it on my system; even if it does get added to the array, converting it back to a string should remove it.

       -Steve


Wednesday, December 17, 2008 12:19 AM

Stephen Cleary said:

That's the UTF-8 BOM. I wasn't getting it on my system; even if it does get added to the array, converting it back to a string should remove it.

       -Steve

MemoryStream output = new MemoryStream(); 
XmlWriterSettings settings = new XmlWriterSettings(); 
settings.Indent = true; 
XmlWriter writer = XmlWriter.Create(output, settings); 
writer.WriteStartDocument(); 
writer.WriteStartElement("Colors"); 
writer.WriteElementString("Color", "RED"); 
writer.WriteEndDocument(); 
writer.Close();tring result = Encoding.UTF8.GetString(output.ToArray()); 
Console.WriteLine(result); 

result with "" symbols (Hex: EF BB BF).


Wednesday, December 17, 2008 4:52 PM

Thanks Steve, it works, but why don't you have BOM as I do? The code is just copy - paste (except "String").


Thursday, December 18, 2008 3:58 AM

I'm not sure. It could have to do with different locale settings or possibly .NET framework versions.

       -Steve


Thursday, December 18, 2008 5:12 AM

:) OK.