Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Question
Saturday, December 13, 2008 1:44 AM | 1 vote
TextWriter output = new StringWriter(); |
XmlTextWriter writer = new XmlTextWriter(output); |
writer.Formatting = Formatting.Indented; |
writer.WriteStartDocument(); |
WriteStartDocument() create a string "<?xml version="1.0" encoding="utf-16"?>"
How to set utf8?
All replies (8)
Wednesday, December 17, 2008 1:53 AM ✅Answered | 4 votes
I just copy and pasted your code, added the "s" to "tring", and ran it. I did not get the BOM.
However, I changed the line "string result = Encoding.UTF8.GetString(output.ToArray());" to "string result = Encoding.Default.GetString(output.ToArray());" and then I could see the BOM.
You can prevent the BOM from being generated by constructing your own instance of the Encoding class instead of using the default UTF8-with-BOM:
string result; |
Encoding utf8noBOM = new UTF8Encoding(false); |
XmlWriterSettings settings = new XmlWriterSettings(); |
settings.Indent = true; |
settings.Encoding = utf8noBOM; |
using (MemoryStream output = new MemoryStream()) |
{ |
using (XmlWriter writer = XmlWriter.Create(output, settings)) |
{ |
writer.WriteStartDocument(); |
writer.WriteStartElement("Colors"); |
writer.WriteElementString("Color", "RED"); |
writer.WriteEndDocument(); |
} |
result = Encoding.Default.GetString(output.ToArray()); |
} |
Console.WriteLine(result); |
You should no longer get the BOM at all then.
-Steve
Saturday, December 13, 2008 6:30 AM | 4 votes
The XmlWriter classes will override any encodings you try to set if the underlying stream has a required encoding.
Strings in .NET are UTF-16; hence, writing XML to a string will be forced to UTF-16. Note that if you write to a file, you can select the encoding.
However, you can force it to write in UTF-8 to a string by writing it to a (binary) memory array (the XmlWriter will use its default UTF-8) and then parsing that into a string, interpreting it as UTF-8:
string result; |
XmlWriterSettings settings = new XmlWriterSettings(); |
settings.Indent = true; |
using (MemoryStream output = new MemoryStream()) |
{ |
using (XmlWriter writer = XmlWriter.Create(output, settings)) |
{ |
writer.WriteStartDocument(); |
} |
result = Encoding.UTF8.GetString(output.ToArray()); |
} |
-Steve
Monday, December 15, 2008 5:39 PM
Thanks.
String result = Encoding.UTF8.GetString(output.ToArray()); |
Creates a string:
<?xml version="1.0" encoding="utf-8"?>
with "" symbols (Hex: EF BB BF).
Monday, December 15, 2008 6:29 PM
That's the UTF-8 BOM. I wasn't getting it on my system; even if it does get added to the array, converting it back to a string should remove it.
-Steve
Wednesday, December 17, 2008 12:19 AM
Stephen Cleary said:
That's the UTF-8 BOM. I wasn't getting it on my system; even if it does get added to the array, converting it back to a string should remove it.
-Steve
MemoryStream output = new MemoryStream(); |
XmlWriterSettings settings = new XmlWriterSettings(); |
settings.Indent = true; |
XmlWriter writer = XmlWriter.Create(output, settings); |
writer.WriteStartDocument(); |
writer.WriteStartElement("Colors"); |
writer.WriteElementString("Color", "RED"); |
writer.WriteEndDocument(); |
writer.Close();tring result = Encoding.UTF8.GetString(output.ToArray()); |
Console.WriteLine(result); |
result with "" symbols (Hex: EF BB BF).
Wednesday, December 17, 2008 4:52 PM
Thanks Steve, it works, but why don't you have BOM as I do? The code is just copy - paste (except "String").
Thursday, December 18, 2008 3:58 AM
I'm not sure. It could have to do with different locale settings or possibly .NET framework versions.
-Steve
Thursday, December 18, 2008 5:12 AM
:) OK.