[Zope-dev] Non-ASCII characters in URLs

Dieter Maurer dieter at handshake.de
Mon Apr 7 15:45:00 EDT 2008


Wichert Akkerman wrote at 2008-4-7 20:45 +0200:
> ...
>> Almost surely, Alexander wants to ask why Zope does not allow
>> non-ASCII characters in ids.
>> 
>> And, in fact, there are only two reasons:
>> 
>>   *  lazyness of the Zope developpers:
>> 
>>      without the restriction to ASCII characters
>>      careful quoting (and unquoting) is necessary
>>      in order to adhere to RFC 2396 (the modern uri syntax specification)
>
>This is becoming increasingly painful

I will soon have a patch against Zope 2.11b1
which gets rid of this restriction.

If there is consense, I can add it to the Zope repository.

> ...
>>   *  there is no way to specify the encoding used for non ASCII characters.
>> 
>>      HTML 4 suggests to convert non ASCII characters first to
>>      UTF-8 and then url escape the result
>>      but most HTTP clients do not follow this suggestion.
>>      Instead, they use the charset found one the page
>>      that cause them to construct the uri.
>> 
>>      I have observed that MS WebDAV from some WebDAV commands
>>      transfers the url as given and for some other
>>      commands recodes them into utf-8.
>> 
>>      Thus, supporting non ASCII ids occationally may cause
>>      surprises.
>
>You mean non ASCII URI's, not non ASCII ids here I suspect.  Somehow I'm
>not surprised those are painful :(

No, I mean non-ASCII ids.

They lead to uris with some escaped characters and MS WebDAV for some commands
unescapes the uris, interprets them in some default charset ("windows-1252"
in our case), recodes them in utf-8,
escapes them again and then uses them in the commands.
Examples are the COPY and MOVE commands. If an object has
a non ASCII charater in its id, say "tüv", its url
may look like "http:.../t%FCv". Used in a "COPY" or "MOVE",
it is however represented as "http:.../t%C2%BCb".



-- 
Dieter


More information about the Zope-Dev mailing list