Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Question
Thursday, May 14, 2009 4:14 PM
We get some import errors, when we have xml values in unicode, with the error message "Not valid XML characters".
Is there an easy way to check a string, to see if it is xml compliant?
All replies (15)
Thursday, May 14, 2009 4:23 PM âś…Answered
You only have to check the existance of five characters:
the quote ("), the apostrophe ('), the less-than (<), the greater-than (>) and the ampersand (&).
You can replace these characters with escape sequences to make an xml-compliant string.
See the following link for more information:
http://www.hdfgroup.org/HDF5/XML/xml_escape_chars.htmDavid Morton - http://blog.davemorton.net/ - @davidmmorton
Thursday, May 14, 2009 4:55 PM
Hello,
This method might help you... I didn't write it but it comes from one of our apps.
/// <summary>
/// Whether a given character is allowed by XML 1.0.
/// </summary>
public static bool IsLegalXmlChar(int character)
{
return
(
character == 0x9 /* == '\t' == 9 */ ||
character == 0xA /* == '\n' == 10 */ ||
character == 0xD /* == '\r' == 13 */ ||
(character >= 0x20 && character <= 0xD7FF) ||
(character >= 0xE000 && character <= 0xFFFD) ||
(character >= 0x10000 && character <= 0x10FFFF)
);
}
public static string EnsureXMLIsClean(string dirty)
{
StringBuilder clean = new StringBuilder();
foreach (char c in dirty.ToCharArray())
{
if(IsLegalXmlChar(c))
clean.Append(c);
}
return clean.ToString();
}
Good luck.
LS
Thursday, May 14, 2009 5:25 PM
Actually, I find that if you need to set an XML value, escaping necessary characters, this is a good way to do it:
string value = GetValueAsString();
var xml = new XAttribute("name", value);
There shouldn't be a need to check for invalid characters.
OTOH, if you're actually testing characters in an XML string, then just use a validating XML parser - this will catch invalid characters and other errors too!
-SteveProgramming blog: http://nitoprograms.blogspot.com/
I will be in Chicago for the WPF training: http://blogs.msdn.com/jaimer/archive/2009/04/01/announcing-the-using-wpf-to-build-lob-applications-training-tour.aspx
Thursday, May 14, 2009 5:34 PM
Well, you can also do this:
public bool IsValidForXml(string input)
{
return System.Security.SecurityElement.Escape(input) == input;
}
And of course, to escape it to prepare it for adding to XML, simply call:
string escapedString = System.Security.SecurityElement.Escape(input);David Morton - http://blog.davemorton.net/ - @davidmmorton
Thursday, May 14, 2009 6:45 PM | 1 vote
And if you want to have you're own with extensions for string:
public static class XmlStringHelper
{
private static readonly char[] s_escapeChars;
private static readonly string[] s_escapeStringPairs;
static XmlStringHelper()
{
s_escapeChars = new char[] { '<', '>', '"', '\'', '&' };
s_escapeStringPairs = new string[] { "<", "<", ">", ">", "\"", """, "'", "'", "&", "&" };
}
public static bool EscapeRequired(string str)
{
if (str == null)
throw new ArgumentNullException("str");
return str.IndexOfAny(s_escapeChars, 0) != -1;
}
public static string Escape(string str)
{
if (str == null)
throw new ArgumentNullException("str");
StringBuilder builder = null;
int length = str.Length;
int startIndex = 0;
while (true)
{
int currentIndex = str.IndexOfAny(s_escapeChars, startIndex);
if (currentIndex == -1)
{
if (builder == null)
return str;
builder.Append(str, startIndex, length - startIndex);
return builder.ToString();
}
if (builder == null)
builder = new StringBuilder();
builder.Append(str, startIndex, currentIndex - startIndex);
builder.Append(GetEscapeSequence(str[currentIndex]));
startIndex = currentIndex + 1;
}
}
private static string GetEscapeSequence(char c)
{
int length = s_escapeStringPairs.Length;
for (int i = 0; i < length; i += 2)
{
if (s_escapeStringPairs[i][0] == c)
return s_escapeStringPairs[i + 1];
}
return c.ToString();
}
}
public static class XmlStringHelperExtensions
{
public static bool XmlEscapeRequired(this string str)
{
return XmlStringHelper.EscapeRequired(str);
}
public static string XmlEscape(this string str)
{
return XmlStringHelper.Escape(str);
}
}
Thursday, May 14, 2009 7:00 PM
Tergiver,
Why not just use the built in stuff?
public static class XmlStringHelperExtensions
{
public static bool EscapeRequired(this string str)
{
return System.Security.SecurityElement.Escape(str) == str;
}
public static string Escape(this string str)
{
return System.Security.SecurityElement.Escape(str);
}
}
David Morton - http://blog.davemorton.net/ - @davidmmorton
Thursday, May 14, 2009 7:07 PM
Because your EscapeRequired is highly inefficient? It does a lot of unnecessary work.
Thursday, May 14, 2009 7:08 PM
Also, if you have your own source you can modify it for other uses (like converting CR/LFs for mailto: strings for instance). This one only requires editing the two constants to do so.
Thursday, May 14, 2009 7:09 PM
3) That code contains some useful learning material.
Thursday, May 14, 2009 7:13 PM
Because your EscapeRequired is highly inefficient? It does a lot of unnecessary work.
I'm going to have to disagree here. To double check, because I could be wrong sometimes, I checked the definition of Escape in the framework.
Here's the definition of SecurityElement.Escape from Reflector:
public static string Escape(string str)
{
if (str == null)
{
return null;
}
StringBuilder builder = null;
int length = str.Length;
int startIndex = 0;
while (true)
{
int num2 = str.IndexOfAny(s_escapeChars, startIndex);
if (num2 == -1)
{
if (builder == null)
{
return str;
}
builder.Append(str, startIndex, length - startIndex);
return builder.ToString();
}
if (builder == null)
{
builder = new StringBuilder();
}
builder.Append(str, startIndex, num2 - startIndex);
builder.Append(GetEscapeSequence(str[num2]));
startIndex = num2 + 1;
}
}
Here's a definition of "GetEscapeSequence" that is listed in the same:
private static string GetEscapeSequence(char c)
{
int length = s_escapeStringPairs.Length;
for (int i = 0; i < length; i += 2)
{
string str = s_escapeStringPairs[i];
string str2 = s_escapeStringPairs[i + 1];
if (str[0] == c)
{
return str2;
}
}
return c.ToString();
}
The only thing missing is EscapeRequired, which is not provided in the framework.
It appears that if mine is inefficient, yours is... well... literally identical.
David Morton - http://blog.davemorton.net/ - @davidmmorton
Thursday, May 14, 2009 7:17 PM
It's identical because I stole it from there. But you're implementation of EscapeRequired parses and constructs an escaped string, and then compares the two strings!!
I took the framework code and extended it to provide a more efficient EscapeRequired method.
Thursday, May 14, 2009 7:39 PM | 1 vote
The .EscapeRequired method that Tergiver provides is more efficient by, at my estimation, about 7 times. I tested using the following code:
int iterations = 1000000;
string value = "This is the \text\ we want to escape & format properly";
bool result;
var sw = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
result = XmlStringHelper.EscapeRequired(value);
Console.WriteLine(sw.ElapsedMilliseconds);
sw = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
result = System.Security.SecurityElement.Escape(value) == value;
Console.WriteLine(sw.ElapsedMilliseconds);
The results were 227 milliseconds for Tergiver's response, and 1648 for mine on 1,000,000 iterations.
Personally, however, I'd still go for simplicity unless I started having performance problems. We're talking about literally hundred-thousanths of a second difference per iteration. I doubt the OP is importing the files that quickly or has the need to. Yours is more performant, but mine is far more readable (on the basis that the complex stuff is abstracted away). If I started having performance issues, however, I would not hesitate to use yours.
Btw.. I'm not attacking you. :) I just wanted to know why you suggested that route. Knowing why I would use one version over and balancing them with my needs should be the determining factor in making a decision on which to go with.
David Morton - http://blog.davemorton.net/ - @davidmmorton
Thursday, May 14, 2009 7:54 PM
The problem with your test code is that it hides the inefficency which is an O(N) problem.
Your method calls String.IndexOfAny N times where N is always at least 1. For any value of N > 1, it again calls String.IndexOfAny and additionally makes several StringBuilder.Append calls each. Add in 1 call to create the builder, and another call for StringBuilder.ToString (once when N > 1) and the string comparison at the end regardless of the value of N.
My method calls String.IndexOfAny once and that's the end of it, whether N is 1 or 1000.
I agree with you that performace isn't the be all, end all issue and that abstraction is good. However, use of the code I gave doesn't require any knowledge of its internals. Take it, drop it in your assembly, and blissfully use it, secure in the knowledge that it is well written as it's A) mostly stolen from MS and B) heavily scrutinized right here in this forum.
Thursday, May 14, 2009 7:59 PM
AND.. I gave three pretty good reasons to use the source code version. Here's a fourth:
- System.Security.SecurityElement? WTF does that have to do with writing correct XML? XmlStringHelper is better, but given the source, you can name it whatever your heart desires.
Thursday, May 14, 2009 8:12 PM
p.s. I know you are not attacking me personally. I've been doing this a very long time (debating the value of one method over another). One of the things I love most about programming is that there is always more than one way to do the same thing and engaging others over the value of one or the other is fun, and often educational.