[Zope-dev] Fwd: [ZPT] Making PageTemplate's edit pages Unicode aware

Stuart Bishop stuart at stuartbishop.net
Thu Apr 8 20:52:47 EDT 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I havn't received much feedback on the ZPT mailing list, so I
thought I'd bring it over here to a wider audience (thread is at
http://mail.zope.org/pipermail/zpt/2004-March/005218.html ).

Begin forwarded message:

> From: Stuart Bishop <stuart.b at commonground.com.au>
> Date: 29 March 2004 6:13:06 PM
> To: Dieter Maurer <dieter at handshake.de>
> Cc: zpt at zope.org
> Subject: Re: [ZPT] Makeing PageTemplate's edit pages Unicode aware
>
> On 27/03/2004, at 9:57 PM, Dieter Maurer wrote:
>
>> Stuart Bishop wrote at 2004-3-25 12:27 +1100:
>>> Currently, if you enter non-ascii text into the title or contents
>>> fields on a PageTemplate's edit page, the data ends up stored as
>>> an encoded string (using management_page_charset, if it is set. 
>>> Unknown
>>> encoding if it is not).
>>>
>>> This should be easy to fix using the foo:charset:ustring notation
>>> to have Zope convert the encoded strings to Unicode. However, the
>>> file upload  feature is more problematic. Should the file upload
>>> try converting the file to Unicode from UTF-8 and raise an exception
>>> if this is not possible? I personally feel this is preferable to
>>> ending up with arbitrarily enncoded document source, with no idea
>>> of the character set used.
>>
>> I do not think that Zope should convert when it does not know the
>> encoding. I am unaware that a missing "management_page_charset"
>> can be interpreted as "UTF-8". If this were the case, converstion
>> to unicode might be correct. By the way: the HTML specification
>> says that uploaded files should come with a "content-type" 
>> declaration.
>> In this case, the charset specified there (if any) should be used
>> to determine the encoding.
>
> Yes - A missing management_page_charset should probably be interpreted
> as either US-ASCII or ISO-8859-1. US-ASCII is probably more correct,
> but I would guess that most browsers will be configured to use
> ISO-8859-1 as their default (and this might be specified in the HTML
> spec?)
>
> I guess using the charset type the browser tells us for file uploads
> means we can blame the browser. I don't know how this could be reliable
> (since text files themselves don't encode their character set unless
> they happen to be UTF-16 or have a BOM). I am wondering if having a
> file upload  function is incompatible with a Unicode aware page
> templates product.
>
> If management_page_charset is not set, it is unknown what charset
> is being used. The only way of knowing the character set of data that
> has been submitted is to know the character set of the form that it
> was submitted from. All other mechanisms do not work due to
> incompatibilities in how the browsers work.
>
> Currently, if you create a page template that contains non-ASCII
> characters, any tal:content or tal:replace expressions that return
> Unicode will now raise a Unicode error. This can be demonstrated
> simply:
>     <html>
>       <div>My 2¢</div>
>       <div tal:content="python:u'My 2\N{CENT SIGN}'">My 2¢</div>
>     </html>
> 	
> These are the things I think need to be fixed in Zope's Page Templates
> implementation to make them Unicode aware. There may be more (?):
>
> 	- It should be possible for the actual page template source to
> 		be stored as a Unicode string. Currently, there is an assert
> 		ensuring it is a traditional string.
>
> 	- The title property should be a Unicode string.
>
> 	- PageTemplateFile should grow an optional charset parameter,
> 	  defaulting to US-ASCII.
>
> 	- PageTemplate.write(text) should raise an exception if text
> 	  is not either a Unicode string or an ASCII string.
>
>     - The ZopePageTemplate edit page should use Zope's
> 	  :charset:ustring notation so Unicode strings get passed
> 	  to its handler.
>
> 	- The file upload widget needs to either be removed, or grow
> 	  a charset box. I don't think either of these solutions are
> 	  ideal :-(
>
> Note that when I say 'Unicode string', we can still store ASCII
> text using a traditional string to save space.
>
> My application is currently using a ZopePageTemplate subclass that
> has been modified to use Unicode strings for the document source
> and title, and it seems to be functioning just fine. Does anyone
> know if that "assert type(text) == type('')" in PageTemplate.write
> is there for a reason?


- --  
Stuart Bishop <stuart at stuartbishop.net>
http://www.stuartbishop.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (Darwin)

iD8DBQFAdfPfAfqZj7rGN0oRAkBuAJ0WLSC3V2eL+zNzkQqBqjJ2bl5degCfe2SB
DlT7NTsieQlDhVgEnHYaXp8=
=6XPE
-----END PGP SIGNATURE-----




More information about the Zope-Dev mailing list