i18n site and search robots
Hi, This is not strictly speaking a Zope problem, but certainly lots amond you faced and fixed this. I made a i18n site with Localizer that runs fairly good, including its i18n search engine. But what about external searche engine robots (google, infossek...) How to "tell" them that they may browse and index the pages in french, english, spanish (...), changing their http header "Accept-Language" ? Many thanks in advance --Gilles
Gilles Lenfant wrote at 2003-7-22 15:50 +0200:
This is not strictly speaking a Zope problem, but certainly lots amond you faced and fixed this. I made a i18n site with Localizer that runs fairly good, including its i18n search engine. But what about external searche engine robots (google, infossek...) How to "tell" them that they may browse and index the pages in french, english, spanish (...), changing their http header "Accept-Language" ?
Not sure, whether this is the most elegant way, but: You could have "language access folders", e.g. "en", "fr", "de". Requests that go through these folders select the corresponding language. A ("SiteAccess") AccessRule in the folders ensures that "Accept-Language" is correctly set in "REQUEST.environ" and that even "absolute_url" generates the correct language specific URLs. Dieter
Hi, Dieter Maurer wrote:
Gilles Lenfant wrote at 2003-7-22 15:50 +0200:
This is not strictly speaking a Zope problem, but certainly lots amond you faced and fixed this. I made a i18n site with Localizer that runs fairly good, including its i18n search engine. But what about external searche engine robots (google, infossek...) How to "tell" them that they may browse and index the pages in french, english, spanish (...), changing their http header "Accept-Language" ?
Not sure, whether this is the most elegant way, but:
You could have "language access folders", e.g. "en", "fr", "de".
Requests that go through these folders select the corresponding language. A ("SiteAccess") AccessRule in the folders ensures that "Accept-Language" is correctly set in "REQUEST.environ" and that even "absolute_url" generates the correct language specific URLs.
According to the W3C standard, the server would 1.) issue a vary: Accept-Language header on each request 2.) if no accept-language header is sent, definition requires to send 300 "Multiple Choices" as status and provide a list of available variations In the multiple choice answer, the list could consist of the said links to the language-acess folders Dieter proposed. This would make a good crawler switch.
----- Original Message ----- From: "Tino Wildenhain" <tino@wildenhain.de> To: "Dieter Maurer" <dieter@handshake.de> Cc: "Gilles Lenfant" <gilles@pilotsystems.net>; <zope@zope.org> Sent: Wednesday, July 23, 2003 9:13 AM Subject: Re: [Zope] i18n site and search robots
Hi,
Dieter Maurer wrote:
Gilles Lenfant wrote at 2003-7-22 15:50 +0200:
This is not strictly speaking a Zope problem, but certainly lots amond you faced and fixed this. I made a i18n site with Localizer that runs fairly good, including its i18n search engine. But what about external searche engine robots (google, infossek...) How to "tell" them that they may browse and index the pages in french, english, spanish (...), changing their http header "Accept-Language" ?
Not sure, whether this is the most elegant way, but:
You could have "language access folders", e.g. "en", "fr", "de".
Requests that go through these folders select the corresponding language. A ("SiteAccess") AccessRule in the folders ensures that "Accept-Language" is correctly set in "REQUEST.environ" and that even "absolute_url" generates the correct language specific URLs.
According to the W3C standard, the server would 1.) issue a vary: Accept-Language header on each request 2.) if no accept-language header is sent, definition requires to send 300 "Multiple Choices" as status and provide a list of available variations In the multiple choice answer, the list could consist of the said links to the language-acess folders Dieter proposed.
This would make a good crawler switch.
Many thanks Tino, Could you please give this full doc URL. I didn't find this (or search correctly) in the w3c. Thanks in advance. --Gilles
Hi Gilles: Gilles Lenfant wrote:
----- Original Message ----- From: "Tino Wildenhain" <tino@wildenhain.de> To: "Dieter Maurer" <dieter@handshake.de> Cc: "Gilles Lenfant" <gilles@pilotsystems.net>; <zope@zope.org> Sent: Wednesday, July 23, 2003 9:13 AM Subject: Re: [Zope] i18n site and search robots
Hi,
Dieter Maurer wrote:
Gilles Lenfant wrote at 2003-7-22 15:50 +0200:
This is not strictly speaking a Zope problem, but certainly lots
amond you
faced and fixed this. I made a i18n site with Localizer that runs fairly good, including
its i18n
search engine. But what about external searche engine robots (google, infossek...) How to "tell" them that they may browse and index the pages in
french,
english, spanish (...), changing their http header "Accept-Language"
?
Not sure, whether this is the most elegant way, but:
You could have "language access folders", e.g. "en", "fr", "de".
Requests that go through these folders select the corresponding language. A ("SiteAccess") AccessRule in the folders ensures that "Accept-Language" is correctly set in "REQUEST.environ" and that even "absolute_url" generates the correct language specific URLs.
According to the W3C standard, the server would 1.) issue a vary: Accept-Language header on each request 2.) if no accept-language header is sent, definition requires to send 300 "Multiple Choices" as status and provide a list of available variations In the multiple choice answer, the list could consist of the said links to the language-acess folders Dieter proposed.
This would make a good crawler switch.
Many thanks Tino,
Could you please give this full doc URL. I didn't find this (or search correctly) in the w3c.
Thanks in advance.
Sorry, it was (of course) not W3C but RFC ;)) Fielding, et al. Standards Track [Page 60/61] RFC 2616 HTTP/1.1 June 1999 10.3.1 300 Multiple Choices The requested resource corresponds to any one of a set of representations, each with its own specific location, and agent- driven negotiation information (section 12) is being provided so that the user (or user agent) can select a preferred representation and redirect its request to that location. Unless it was a HEAD request, the response SHOULD include an entity containing a list of resource characteristics and location(s) from which the user or user agent can choose the one most appropriate. The entity format is specified by the media type given in the Content- Type header field. Depending upon the format and the capabilities of the user agent, selection of the most appropriate choice MAY be performed automatically. However, this specification does not define any standard for such automatic selection. If the server has a preferred choice of representation, it SHOULD include the specific URI for that representation in the Location field; user agents MAY use the Location field value for automatic redirection. This response is cacheable unless indicated otherwise. I think you can include references to different alternatives into the HTML-Header too. Maybe the <link ..> and <meta ..> tags have definitions for this. Regards Tino Wildenhain
----- Original Message ----- From: "Dieter Maurer" <dieter@handshake.de> To: "Gilles Lenfant" <gilles@pilotsystems.net> Cc: <zope@zope.org> Sent: Tuesday, July 22, 2003 11:35 PM Subject: Re: [Zope] i18n site and search robots
Gilles Lenfant wrote at 2003-7-22 15:50 +0200:
This is not strictly speaking a Zope problem, but certainly lots amond you faced and fixed this. I made a i18n site with Localizer that runs fairly good, including its i18n search engine. But what about external searche engine robots (google, infossek...) How to "tell" them that they may browse and index the pages in french, english, spanish (...), changing their http header "Accept-Language" ?
Not sure, whether this is the most elegant way, but:
You could have "language access folders", e.g. "en", "fr", "de".
Requests that go through these folders select the corresponding language. A ("SiteAccess") AccessRule in the folders ensures that "Accept-Language" is correctly set in "REQUEST.environ" and that even "absolute_url" generates the correct language specific URLs.
Dieter
Dieter, Thanks for the valuable tip I would use in other situations, but the Localizer has a built-in magic feature that assumes this (thansform http://mysite.org/en/stuff into http://mysite.org/stuff with English as prefered language, ignoring the language cookie and the browser preferences) I just need to know how to reply to a search engine robot : """ Hey robot ! This page is also available in spanish and russian if you provide the appropriate "Accept-Language" header """ This is not the most elegant way but I think that providing (hidden) links could be the solution. (<a href="http://mysite.org/es/stuff">in spanish</a> (...) in the page http://mysite/stuff ) But there are perhaps better "w3c" friendly solutions. Cheers --Gilles
participants (3)
-
Dieter Maurer -
Gilles Lenfant -
Tino Wildenhain