Share via


URL Rewrite Module decodes UTF-8 encoded querystring as if it were iso-8859-1

Question

Saturday, January 2, 2016 3:13 PM

I'm about to move a multisite Wordpress site from Linux and nginx to IIS. I've set up a server with Windows 2012 R2. The problem is that the site users have used non-ascii characters in media file names. Everything is in UTF-8, database, php-files, etc. All files have been moved with preserved filenames, etc. What happens is that when the rule for fetching media files are matched, the rewrite engine changes the file name with non-ascii characters in. I have a failed request trace below that shows what is happening:

33. Rule match, look at the URL!

34. Querystring is URL-encoded. All non-ascii characters are encoded as if they were UTF-8 (correctly)

35-37. The child request is made, but with the wrong characters. If you take the quesystring from step 34 and treat it as iso-8859-1, you get this result when you url-decode it. This request fails, since there are no file named like that.

What should I change and where? In my opinion the URL Rewrite Module makes a mistake here, but how can I tell how to behave?

33. -REWRITE_ACTION

Substitution wp-includes/ms-files.php?file={R:2}
RewriteURL /wp-includes/ms-files.php?file=2013/12/Grävling_mårdfällor.jpg
AppendQueryString false
LogRewrittenURL false

<fieldset class="no-border" id="section_detail_34">

0 ms

Verbose

34. -RULE_EVALUATION_END

RuleName Imported Rule 2
RequestURL wp-includes/ms-files.php
QueryString ?file=2013/12/Gr%C3%A4vling_m%C3%A5rdf%C3%A4llor.jpg
StopProcessing true
Succeeded true

</fieldset><fieldset class="no-border" id="section_detail_35">

0 ms

Informational

35. -URL_REWRITE_END

RequestURL /wp-includes/ms-files.php

</fieldset><fieldset><fieldset class="no-border" id="section_detail_36">

0 ms

36. -GENERAL_CHILD_REQUEST_START

SiteId 4
RequestURL http://krets.jagareforbundet.se:80/wp-includes/ms-files.php?file=2013/12/Grävling_mårdfällor.jpg
RequestVerb GET
RecursiveLevel 1

</fieldset><fieldset><fieldset class="no-border" id="section_detail_37">

0 ms

37. -GENERAL_REQUEST_START

SiteId 4
AppPoolId krets.jagareforbundet.se
ConnId 1610612783
RawConnId 0
RequestURL http://krets.jagareforbundet.se:80/wp-includes/ms-files.php?file=2013/12/Grävling_mårdfällor.jpg
RequestVerb GET

</fieldset></fieldset></fieldset>

All replies (9)

Friday, January 15, 2016 4:46 AM ✅Answered

Hi,

Because the requests were encoded as UTF-8 but decoded as Windows-1252.

Please try setting the NE or NO flag in isapi-rewrite, more information about isapi-rewrite included in the link below:

http://www.helicontech.com/isapi_rewrite/doc/RewriteRule.htm

Note: this hyperlink is third party, it is just reference for you.

Regards,

Angie


Monday, January 4, 2016 7:11 AM

I have tested using Helicontechs ISAPI Rewrite and now it all works perfectly, so there is something strange happening in the IIS URL Rewrite Module... It would be nice if someone had an answer to this. Is it a bug or is it something that can be changed by configuration?


Monday, January 4, 2016 8:20 PM

Hi,

You can set Filter High Bit Characters feature which included in Request Filtering Module to accept URLs contain non-ascii characters.

You can find more information about Request Filtering Module in the following link.

http://www.iis.net/learn/manage/configuring-security/use-request-filtering

Regards,

Angie


Tuesday, January 5, 2016 1:16 AM

Perhaps I misunderstood your answer, but that option is on by default and I have verified that it is set on this site. The URL containing high bit characters is accepted and received by the URL Rewrite Module, as you can see in my trace. The problem (in my opinion) is that it is encoded and/or decoded in the wrong way before doing the child request.


Tuesday, January 5, 2016 1:55 AM

What rules are you using?


Tuesday, January 5, 2016 2:01 AM

Looks like you need to do some encoding/decoding

http://www.iis.net/learn/extensions/url-rewrite-module/url-rewrite-module-configuration-reference

"

String functions

There are three string functions available for changing the values within a rewrite rule action, as well as any conditions:

  • ToLower - returns the input string converted to lower case.
  • UrlEncode - returns the input string converted to URL-encoded format. This function can be used if the substitution URL in rewrite rule contains special characters (for example non-ASCII or URI-unsafe characters).
  • UrlDecode - decodes the URL-encoded input string. This function can be used to decode a condition input before matching it against a pattern.

The functions can be invoked by using the following syntax:

<samp class="prettyprint prettyprinted">{function_name:any_string}</samp>

Where "function_name" can be on eof the following: "ToLower", "UrlEncode", "UrlDecode". "Any_string" can be either a literal string or a string built by using server variables or back-references. For example, the following are valid invocations of string functions:

<samp class="prettyprint prettyprinted">{ToLower:DEFAULT.HTM}
{UrlDecode:{REQUEST_URI}}
{UrlEncode:{R:1}.aspx?p=[résumé]}</samp>

The string functions can be used in the following locations within rewrite rules:

  • In condition input strings
  • In rule substitution strings, specifically:
    • url attribute of Rewrite and Redirect actions
    • statusLine and responseLine attributes of a CustomResponse action

An example of a rule that uses the ToLower function:

<samp class="prettyprint prettyprinted"><rule name="Redirect to canonical url">
<match url="^(.+)" > <!-- rule back-reference is captured here -->
<conditions>
<!-- Check whether the requested domain is in canonical form -->
<add input="{HTTP_HOST}" type="Pattern" pattern="^www\mysite\com$" negate="true">
</conditions>
<!-- Redirect to canonical url and convert URL path to lowercase -->
<action type="Redirect" url="http://www.mysite.com/{tolower:{R:1}}" RedirectType="Found"/>
</rule></samp>

An example of a rule that uses the UrlEncode function:

<samp class="prettyprint prettyprinted"><rules>
<rule name="UrlEncode example" stopProcessing="true">
<match url="resume" />
<action type="Rewrite" url="default.aspx?name={UrlEncode:résumé}"/>
</rule></samp>

An example of a rule that uses the UrlDecode function:

<samp class="prettyprint prettyprinted"><rules>
<rule name="UrlDecode example">
<match url="default.aspx" />
<conditions>
<add input="{UrlDecode:{QUERY_STRING}}" pattern="résumé" />
</conditions>
<action type="Rewrite" url="default.aspx?type=resume" />
</rule>
</rules>"</samp>


Tuesday, January 5, 2016 2:03 AM

also see:

http://stackoverflow.com/questions/14476542/iis-7-5-url-rewrite-encoding


Tuesday, January 5, 2016 2:44 AM

Thanks Rovastar!

I changed the rule

                <rule name="Imported Rule 2" stopProcessing="true">
                    <match url="^([_0-9a-zA-Z-]+/)?files/(.+)" ignoreCase="false" />
                    <action type="Rewrite" url="wp-includes/ms-files.php?file={R:2}" appendQueryString="false" />
                </rule>

to

                <rule name="Imported Rule 2" stopProcessing="true">
                    <match url="^([_0-9a-zA-Z-]+/)?files/(.+)" ignoreCase="false" />
                    <action type="Rewrite" url="wp-includes/ms-files.php?file={UrlEncode:{R:2}}" appendQueryString="false" />
                </rule>

Now everything works as expected! Many thanks!


Tuesday, January 5, 2016 4:59 AM

I spoke to soon. It was a cached response (silly me) from when I was using the ISAPI Rewrite module. The changed rule made a difference and it looks good in a trace, but it still doesn't work through the URL Rewrite Module. Something is different when using ISAPI Rewrite. I'm going to do further testing to see what differs between the results from using URL Rewrite Module and ISAPI Rewrite.