[Zope] i18n site and search robots
Tino Wildenhain
tino@wildenhain.de
Wed, 23 Jul 2003 15:33:08 +0200
Hi Gilles:
Gilles Lenfant wrote:
> ----- Original Message -----
> From: "Tino Wildenhain" <tino@wildenhain.de>
> To: "Dieter Maurer" <dieter@handshake.de>
> Cc: "Gilles Lenfant" <gilles@pilotsystems.net>; <zope@zope.org>
> Sent: Wednesday, July 23, 2003 9:13 AM
> Subject: Re: [Zope] i18n site and search robots
>
>
>
>>Hi,
>>
>>Dieter Maurer wrote:
>>
>>>Gilles Lenfant wrote at 2003-7-22 15:50 +0200:
>>> > This is not strictly speaking a Zope problem, but certainly lots
>
> amond you
>
>>> > faced and fixed this.
>>> > I made a i18n site with Localizer that runs fairly good, including
>
> its i18n
>
>>> > search engine.
>>> > But what about external searche engine robots (google, infossek...)
>>> > How to "tell" them that they may browse and index the pages in
>
> french,
>
>>> > english, spanish (...), changing their http header "Accept-Language"
>
> ?
>
>>>Not sure, whether this is the most elegant way, but:
>>>
>>> You could have "language access folders", e.g. "en", "fr", "de".
>>>
>>> Requests that go through these folders select the corresponding
>>> language. A ("SiteAccess") AccessRule in the folders ensures
>>> that "Accept-Language" is correctly set in "REQUEST.environ"
>>> and that even "absolute_url" generates the correct language
>>> specific URLs.
>>>
>>
>>
>>
>>According to the W3C standard, the server would
>>1.) issue a vary: Accept-Language header on each request
>>2.) if no accept-language header is sent, definition requires to send
>> 300 "Multiple Choices" as status and provide a list of available
>> variations
>>In the multiple choice answer, the list could consist of the said links
>>to the language-acess folders Dieter proposed.
>>
>>
>>This would make a good crawler switch.
>>
>
>
> Many thanks Tino,
>
> Could you please give this full doc URL.
> I didn't find this (or search correctly) in the w3c.
>
> Thanks in advance.
>
Sorry, it was (of course) not W3C but RFC ;))
Fielding, et al. Standards Track [Page 60/61]
RFC 2616 HTTP/1.1 June 1999
10.3.1 300 Multiple Choices
The requested resource corresponds to any one of a set of
representations, each with its own specific location, and agent-
driven negotiation information (section 12) is being provided so that
the user (or user agent) can select a preferred representation and
redirect its request to that location.
Unless it was a HEAD request, the response SHOULD include an entity
containing a list of resource characteristics and location(s) from
which the user or user agent can choose the one most appropriate. The
entity format is specified by the media type given in the Content-
Type header field. Depending upon the format and the capabilities of
the user agent, selection of the most appropriate choice MAY be
performed automatically. However, this specification does not define
any standard for such automatic selection.
If the server has a preferred choice of representation, it SHOULD
include the specific URI for that representation in the Location
field; user agents MAY use the Location field value for automatic
redirection. This response is cacheable unless indicated otherwise.
I think you can include references to different alternatives into
the HTML-Header too. Maybe the <link ..> and <meta ..> tags
have definitions for this.
Regards
Tino Wildenhain