I'm having unicode troubles, and I'm not sure if I'm running into a "Zope doesn't do that" problem or perhaps I'm just an idiot. Basically, I've an external method called xmlheadparse, which returns a list of lists of headlines and URLs, given a an xml file on our server. [[Headline here, url here],[headline2 here, url2 here]] I process the list into an html page of links with the following Zope Python script, where urlfeed is the name of the xml file I want to display: returnedhtml="" storypooge=context.xmlheadparse('/var/www/ap/'+urlfeed) for x in range(len(storypooge)): returnedhtml=returnedhtml+' <a href="/News/apmethods/apstory?urlfeed=' url=context.nntpnamestripper(storypooge[x][1]) headline=storypooge[x][0] returnedhtml=returnedhtml+url+'">'+headline+'</a><br />\n' return returnedhtml This works fine with English text, but I also have Spanish headlines in some of the files. When run through this script, I get the following error: Error Type: UnicodeError Error Value: ASCII encoding error: ordinal not in range(128) The weird thing is, I can get just the unicode headline to display, but not concatenated into the rest of the stuff. I can't seem to encode all of the pieces into the same format. What am I doing wrong? Thanks, Chris
On Thursday 12 Sep 2002 9:50 pm, Chris Muldrow wrote:
I'm having unicode troubles, and I'm not sure if I'm running into a "Zope doesn't do that" problem or perhaps I'm just an idiot.
Basically, I've an external method called xmlheadparse, which returns a list of lists of headlines and URLs, given a an xml file on our server. [[Headline here, url here],[headline2 here, url2 here]] I process the list into an html page of links with the following Zope Python script, where urlfeed is the name of the xml file I want to display:
returnedhtml="" storypooge=context.xmlheadparse('/var/www/ap/'+urlfeed) for x in range(len(storypooge)): returnedhtml=returnedhtml+' <a href="/News/apmethods/apstory?urlfeed=' url=context.nntpnamestripper(storypooge[x][1]) headline=storypooge[x][0] returnedhtml=returnedhtml+url+'">'+headline+'</a><br />\n' return returnedhtml
This works fine with English text, but I also have Spanish headlines in some of the files. When run through this script, I get the following error: Error Type: UnicodeError Error Value: ASCII encoding error: ordinal not in range(128)
The weird thing is, I can get just the unicode headline to display, but not concatenated into the rest of the stuff. I can't seem to encode all of the pieces into the same format. What am I doing wrong?
You are mixing: 1. A unicode string 2. A plain 8-bit string with characters outside the ascii range. Its not clear from the code fragment which strings are the unicode ones, and which are not. I suggest you work all in unicode.... You need to convert those 8 bit strings into unicode strings by applying a character encoding using code like... myunicodestring = unicode(my8bitstring,'utf-8') ....substitute 'utf-8' for whatever character encoding you are using.
You are mixing: 1. A unicode string 2. A plain 8-bit string with characters outside the ascii range.
Its not clear from the code fragment which strings are the unicode ones, and which are not. I suggest you work all in unicode.... You need to convert those 8 bit strings into unicode strings by applying a character encoding using code like...
myunicodestring = unicode(my8bitstring,'utf-8')
....substitute 'utf-8' for whatever character encoding you are using.
returnedhtml="" storypooge=context.xmlheadparse('/var/www/ap/'+urlfeed) for x in range(len(storypooge)): returnedhtml=returnedhtml+' <a href="/News/apmethods/apstory?urlfeed=' url=context.nntpnamestripper(storypooge[x][1]) # this is a string headline=storypooge[x][0] # This is unicode returnedhtml=unicode(url, "ascii") return returnedhtml When I do this (with any encoding) I get "decoding unicode is not supported" error
So, my cloud of stupidity just lifted, and I think I've seen the light. I just realized my string never actually got turned into unicode along the way--I'm actually dealing with an ascii string and a utf-8 string. When I encode the ascii into utf-8 with asciistring.encode("utf-8"), I can concatenate them together. Thanks for the help--your answer got my mind turning a different direction. - Chris
You are mixing: 1. A unicode string 2. A plain 8-bit string with characters outside the ascii range.
Its not clear from the code fragment which strings are the unicode ones, and which are not. I suggest you work all in unicode.... You need to convert those 8 bit strings into unicode strings by applying a character encoding using code like...
myunicodestring = unicode(my8bitstring,'utf-8')
....substitute 'utf-8' for whatever character encoding you are using.
participants (2)
-
Chris Muldrow -
Toby Dickenson