Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Question
Tuesday, December 20, 2016 5:26 AM
I have an UTF-8 encoded xml
<?xml version="1.0" encoding="UTF-8"?>
When using below version of xml reader. I am assuming this uses UTF-8 enoding to parse xml file.
using (XmlReader reader = XmlReader.Create(inputUri))
I am getting below exception.
System.Xml.XmlException occurred
HResult=-2146232000
LineNumber=18750
LinePosition=13
Message=Invalid character in the given encoding. Line 18750, position 13.
But when using below version of xmlreader
using (XmlReader reader = XmlReader.Create(new StreamReader(inputUri,Encoding.UTF8)))
The xml gets parsed successfully. Why such differences between these two versions given both uses same encoding to parse the given xml file??
PS: I am pretty much sure the first version uses UTF-8 endoding.
Below is the snippet from XmlTextReaderImpl.cs whose instance is returned by the first version.
private void SetupEncoding( Encoding encoding ) {
if ( encoding == null ) {
Debug.Assert( ps.charPos == 0 );
ps.encoding = Encoding.UTF8;
ps.decoder = new SafeAsciiDecoder(); // This falls back to UTF-8 decoder
}
}
All replies (5)
Tuesday, December 20, 2016 8:19 AM âś…Answered | 1 vote
XmlReader will mark any illegal character as illegal because the XML format is defined to be broken in that case.
On the second case, because StreamReader is a general purpose Text reader, when it encounters data that is not within range defined by Encoding, it replace the character with a replacement fallback. And therefore when you pass the resulting stream to XmlReader, all characters it can see now falls in legal range defined by the encoding.
Tuesday, December 20, 2016 7:02 AM
In the First case even though you have specified the encoding as UTF-8, the actual encoding should be something else that's why you are getting error.
Whenever you pass a path to the XmlReader.Create method, it uses the encoding attribute in the XML declaration.
This is failing because actual encoding is different. You can check the file encoding by using File->Advanced Save Options in Visual Studio.
It all Happenz Sendil
Tuesday, December 20, 2016 7:26 AM
It's UTF-8 .
Anyway if it were't UTF-8
using (XmlReader reader = XmlReader.Create(new StreamReader(inputUri,Encoding.UTF8)))
should also fail, rt?
Tuesday, December 20, 2016 8:06 AM
What you think is right, but what actually happens is Whenever you use StreamReader for XmlReader, it ignores the encoding value given in the attribute of the XML declaration & uses the encoding of the StreamReader.
It all Happenz Sendil
Tuesday, December 20, 2016 8:25 AM
Perfect. Thanks cheong00!!!.