Problem with links containing non-ascii characters in StructuredText
Hi, I have a problem getting links to function in StructuredText when they have non-ascii characters in the title. - This will render as a link: "Zope website":http://zope.org/ - This will render literally: "Zöpe website":http://zope.org/ [Notice the diaeresis in the second case.] This can of course be solved by using html entities like 'ö'. But I have just started using utf-8 so I don't have to bother myself with writing html entities. There is probably a Python method that can translate 'ö' into 'ö', but I would like the resulting html code to be humanly readable utf-8 as well. I wouldn't mind iso-8859-1 as that's what it basically is in my case (I'm Dutch) but utf-8 seems the way to go. Anyway, here is a script that illustrates the problem. It has some extra non-ascii characters thrown in just to show that these characters don't give any problems outside of the links. ------------------ import Products.PythonScripts.standard print """ <html><head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> </head><body> """ text=""" Let's link to a "Zope website":http://zope.org/. Nó. Let's lïnk to à "Zöpe website":http://zope.org/. """ ppss=Products.PythonScripts.standard.structured_text print ppss(unicode(text, 'iso-8859-1').encode('utf-8')) # The following line has the same effect: #print unicode(ppss(text), 'iso-8859-1').encode('utf-8') print "</body>" return printed ------------------ Don't worry, this is not how I usually make my pages. ;-) This results in the following html source code: ------------------ <html><head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> </head><body> <p>Let's link to a <a href="http://zope.org/">Zope website</a>.</p> <p>Nó. Let's lïnk to à "Zöpe website":http://zope.org/.</p> </body> ------------------ That last failed link is obviously not as it should be. Does anyone know a solution? I failed to find one with Google. I wondered if it had to do with the diaeresis specifically, but the same thing goes wrong with e.g. 'Zópe'. BTW, I use the Debian Sarge version of Zope 2.7. Thanks, -- Maurits van Rees | http://maurits.vanrees.org/ [Dutch/Nederlands] Public GnuPG key: http://maurits.vanrees.org/var/gpgkey.asc "It can seem like you're doing just fine, but the creep's creeping into your mind." - Neal Morse
Maurits van Rees wrote:
Hi,
I have a problem getting links to function in StructuredText when they have non-ascii characters in the title.
This can of course be solved by using html entities like 'ö'. But I have just started using utf-8 so I don't have to bother myself with writing html entities.
This wouldn't solve the overall problem - not all utf-8 characters have html entities. I work with Māori macrons which don't so I'll be listening keenly to discussion about this problem.
You have to configure your locale support in etc/zope.conf properly. -aj --On 30. September 2005 00:26:14 +0200 Maurits van Rees <maurits@vanrees.org> wrote:
Hi,
I have a problem getting links to function in StructuredText when they have non-ascii characters in the title.
- This will render as a link: "Zope website":http://zope.org/
- This will render literally: "Zöpe website":http://zope.org/
[Notice the diaeresis in the second case.]
This can of course be solved by using html entities like 'ö'. But I have just started using utf-8 so I don't have to bother myself with writing html entities. There is probably a Python method that can translate 'ö' into 'ö', but I would like the resulting html code to be humanly readable utf-8 as well. I wouldn't mind iso-8859-1 as that's what it basically is in my case (I'm Dutch) but utf-8 seems the way to go.
Anyway, here is a script that illustrates the problem. It has some extra non-ascii characters thrown in just to show that these characters don't give any problems outside of the links.
------------------ import Products.PythonScripts.standard
print """ <html><head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> </head><body> """
text=""" Let's link to a "Zope website":http://zope.org/.
Nó. Let's lïnk to à "Zöpe website":http://zope.org/. """
ppss=Products.PythonScripts.standard.structured_text print ppss(unicode(text, 'iso-8859-1').encode('utf-8')) # The following line has the same effect: # print unicode(ppss(text), 'iso-8859-1').encode('utf-8')
print "</body>"
return printed ------------------
Don't worry, this is not how I usually make my pages. ;-)
This results in the following html source code:
------------------ <html><head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> </head><body>
<p>Let's link to a <a href="http://zope.org/">Zope website</a>.</p> <p>Nó. Let's lïnk to à "Zöpe website":http://zope.org/.</p>
</body> ------------------
That last failed link is obviously not as it should be. Does anyone know a solution? I failed to find one with Google.
I wondered if it had to do with the diaeresis specifically, but the same thing goes wrong with e.g. 'Zópe'.
BTW, I use the Debian Sarge version of Zope 2.7.
Thanks,
-- Maurits van Rees | http://maurits.vanrees.org/ [Dutch/Nederlands] Public GnuPG key: http://maurits.vanrees.org/var/gpgkey.asc "It can seem like you're doing just fine, but the creep's creeping into your mind." - Neal Morse
On Fri, Sep 30, 2005 at 07:53:03AM +0200, Andreas Jung wrote:
You have to configure your locale support in etc/zope.conf properly.
Ah, that seems to be part of the mix yes. Thanks. I am now trying: locale nl_NL.utf8@euro I am also wondering about the following two settings, that seem to be needed as well: rest-input-encoding utf-8 rest-output-encoding utf-8 According to http://www.zope.org/Wikis/DevSite/Proposals/ReStructuredTextIntegration: Zope 2.X includes a StructuredText module that has several problems: * undefined behaviour in i18n environments (which characters are allowed inside StructuredText markup? Which characters count as punctuation characters?) So I guess I should be using REstructured text instead. This indeed seems to help, though I haven't got all problems ironed out yet. In this new format the following: -------- .. _Zope: http://www.zope.org/ I like the `Zöpe platform`__. __ Zope_ -------- gets transformed into something like: <p>I like the <a class="reference" href="http://www.zope.org/">Zöpe platform</a>.</p> Some other linking methods and other encodings either don't work (Zope doesn't start or throws an error on the page or doesn't produce a link) or they produce a correct link but with ugly characters like above. Maybe I can get it to work with some more trying and guessing. But does anyone have a good link on how to get Zope to work correctly with Unicode? Thanks, -- Maurits van Rees | http://maurits.vanrees.org/ [Dutch/Nederlands] Public GnuPG key: http://maurits.vanrees.org/var/gpgkey.asc "It can seem like you're doing just fine, but the creep's creeping into your mind." - Neal Morse
--On 30. September 2005 12:19:31 +0200 Maurits van Rees <maurits@vanrees.org> wrote:
So I guess I should be using REstructured text instead. This indeed seems to help, though I haven't got all problems ironed out yet. In this new format the following:
StructuredText does not work with UTF8 (when you mean this by mentioning unicode). There is a utf-8 patch for STX but this has other problems. Means: Don't use STX with multi-byte encodings...instead use reST. -aj
On Fri, Sep 30, 2005 at 12:38:04PM +0200, Andreas Jung wrote:
StructuredText does not work with UTF8 (when you mean this by mentioning unicode). There is a utf-8 patch for STX but this has other problems. Means: Don't use STX with multi-byte encodings...instead use reST.
Okay, that's clear now, thanks. I have got it working now: In etc/zope.conf: rest-input-encoding utf-8 rest-output-encoding utf-8 locale nl_NL.utf8@euro Hm, now it seems to function without that locale setting as well. Maybe Zope reads the locale correctly from my system already. Anyway, my script is now: ------------------ import Products.PythonScripts.standard print """ <html><head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> </head><body> """ text=""" .. _Zope: http://www.zope.org/ I like à `Zöpe platform`__. __ Zope_ """ ppsr=Products.PythonScripts.standard.restructured_text print "ppsr(text):" print ppsr(text) print "</body>" return printed ------------------ So no more switching from iso to utf with: print ppsr(unicode(text,'iso-8859-1').encode('utf-8')) So far as I can see this solves my problems. Thanks for thinking with me, Andreas. -- Maurits van Rees | http://maurits.vanrees.org/ [Dutch/Nederlands] Public GnuPG key: http://maurits.vanrees.org/var/gpgkey.asc "It can seem like you're doing just fine, but the creep's creeping into your mind." - Neal Morse
participants (3)
-
Andreas Jung -
Chris Beaven -
Maurits van Rees