charset from forms input
Hi, I seem to have come across the depressing fact that most browsers will not return a charset parameter in the http header when a form is submitted. For example, the following from Netscape ... (it happens with both IE and Netscape on many platforms I have tried ... Mac, all Windows, and Linux). POST /hi HTTP/1.0 Referer: http://localhost:8080/temp/test_form Connection: Keep-Alive User-Agent: Mozilla/4.72 [en] (X11; U; Linux 2.2.14-5.0 i686) Host: 172.16.21.165:50009 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 Content-type: multipart/form-data; boundary=---------------------------17670043309955870831526446972 Content-Length: 180 So much for a useful Content-type. I know this is NOT a zope issue, but I was hoping someone had an easy answer. There is such a myriad of character encodings out there that is makes it quite difficult to handle. The example that most frustrates us are the two byte encodings vs the one. I.e. : two common defaults people set their browsers on in windows are either Western (ISO) or Western (Windows) ... the former being a two byte encoding set and the latter being a one byte(presumably ISO-8859-1 + the unhelpful use of the control set 0x85 - 0x95(hex)). People often copy and paste from word into form text inputs, and as a quick hack we made up a byte conversion table for the "Microsoft" range. So Western(Windows) works, but of course Western(ISO) does not. How does one detect these? and more the point, how does one test easily for any of the other encoding standards? Surely this has bugged a lot of people? regards Matt
On Thu, 14 Dec 2000 09:45:53 +1300, Matt <matt.bion@eudoramail.com> wrote:
Hi, I seem to have come across the depressing fact that most browsers will not return a charset parameter in the http header when a form is submitted. For example, the following from Netscape ... (it happens with both IE and Netscape on many platforms I have tried ... Mac, all Windows, and Linux).
Yes, this is indeed a problem. I have developed some patches to support Unicode in ZPublisher which uses a technique where the character encoding is added to the form field name (where ZPublisher already expects other marshalling information) For example if you have a form with fields named... address:string age:int ...you would change those to... address:utf8:string age:utf8:int ....if you are expecting your form response to be submitted in utf8. Under this patch, you could also change that field to..... address:utf8:ustring ....and store your addresses in unicode. It is possible to guess what character encoding will be used in a form response. The situation isnt quite as simple as Dieter Maurer suggested, but the rules (as I understand them from experimentation) are in the release notes for this patch. If anyone knows a better way, I would love to know too. http://www.zope.org/Members/htrd/wstring Toby Dickenson tdickenson@geminidataloggers.com
participants (2)
-
Matt -
Toby Dickenson