Granting access by reading http headers
We're running Plone for internal departmental use. I'm going to lock down most of the content, requiring a login to view sensitive documents. But I also want our Google Mini appliance to crawl all content. The problem is that the appliance does not accept cookies. So Plone and Zope don't recognize a user account as the crawler attempts to move through links. I am thinking of granting the Google Mini appliance "transparent" access by reading the http headers of incoming requests and granting access if: - the header includes the correct client string AND - The IP address of the requesting machine is owned by the Google Mini host. Questions: 1) Is this approach viable? (What are the pitfalls?) 2) What python module is consulted to determine access rights when a page request is made? 2) Is this difficult to implement if one has rudimentary Python skills? (Or is there already sample code out there to do something like this? I couldn't find any.)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Marc Schnapp wrote:
We're running Plone for internal departmental use. I'm going to lock down most of the content, requiring a login to view sensitive documents. But I also want our Google Mini appliance to crawl all content. The problem is that the appliance does not accept cookies. So Plone and Zope don't recognize a user account as the crawler attempts to move through links.
I am thinking of granting the Google Mini appliance "transparent" access by reading the http headers of incoming requests and granting access if: - the header includes the correct client string AND - The IP address of the requesting machine is owned by the Google Mini host.
Questions:
1) Is this approach viable? (What are the pitfalls?)
2) What python module is consulted to determine access rights when a page request is made?
2) Is this difficult to implement if one has rudimentary Python skills? (Or is there already sample code out there to do something like this? I couldn't find any.)
Such a policy would be trivial to implement in using the ScriptablePlugin within a PluggableAuthenticationService user folder. Even in a "stock" user folder, if you know the IP of the appliance, you can create a user and set the "domain" field to that IP, granting it the roles which allow it to view the site: as long as nobody else can spoof that IP, you should be fine. Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFD9AUY+gerLs4ltQ4RAnAgAKCn1lhuY8UfdH1xj18ycuTgqGhzHgCg1ALi Za9/wpDb04vRTncZiQrr7S0= =UFug -----END PGP SIGNATURE-----
Marc Schnapp wrote:
We're running Plone for internal departmental use. I'm going to lock down most of the content, requiring a login to view sensitive documents. But I also want our Google Mini appliance to crawl all content.
Google Mini can do http basic auth, right? If so, you're fine, just put in the basic auth details and define a user in acl_users. Provided the mini presents the credentials without first being challenged by a 401, you'll be fine...
1) Is this approach viable? (What are the pitfalls?)
I'd worry about headers being spoofed...
2) What python module is consulted to determine access rights when a page request is made?
The user folder, in your case it'll be the hell known as GRUF. Swap that out for the hell known as PAS ;-)
2) Is this difficult to implement if one has rudimentary Python skills?
Yes. cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
If anyone here has the consulting expertise to help implement a solution, please email me separately at m + schnapp + service + marc + dot + com. (See my elaborations below) Chris Withers wrote:
Marc Schnapp wrote:
We're running Plone for internal departmental use. I'm going to lock down most of the content, requiring a login to view sensitive documents. But I also want our Google Mini appliance to crawl all content.
Google Mini can do http basic auth, right? If so, you're fine, just put in the basic auth details and define a user in acl_users. Provided the mini presents the credentials without first being challenged by a 401, you'll be fine...
Marc responds: 1) The Google Mini does not accept cookies. 2) Plone barfs if you try tricks like adding a query string to URLs.
1) Is this approach viable? (What are the pitfalls?)
I'd worry about headers being spoofed...
Marc responds: I don't have to worry about headers being spoofed. The host lives in our dedicated data center behind a VPN concentrator requiring RSA authentication. No one gets to the box unless we already have cleared them through two-phase authentication.
2) What python module is consulted to determine access rights when a page request is made?
The user folder, in your case it'll be the hell known as GRUF. Swap that out for the hell known as PAS ;-)
2) Is this difficult to implement if one has rudimentary Python skills?
Yes.
cheers,
Chris
Marc Schnapp schrieb:
If anyone here has the consulting expertise to help implement a solution, please email me separately at m + schnapp + service + marc + dot + com.
Its much easier as you might think. You dont even change Zope for this if you are using apache as front end proxy via usual mod_rewrite/mod_proxy You simply create a user for your crawler, login as this user and grep the cookie (assuming you are using some kind of cookie based auth - basic auth would work similar) (For example using live-http-headers (mozilla/firefox) or some sniffer or whatever) http://httpd.apache.org/docs/2.0/mod/mod_setenvif.html http://httpd.apache.org/docs/2.0/mod/mod_headers.html will tell you how to set the Cookie Header as if provided by the Crawler client. (A Cookie after all is just another HTTP Header) So if the conditions match: client-ip = your special crawler and useragent = your crawler -> RequestHeader set Cookie ... Ah, and btw. maybe you just use ZCatalog and skip using external crawler :-) Regards Tino
Thanks for the approach! We will be using Apache as the proxy. Question:
So if the conditions match: client-ip = your special crawler and useragent = your crawler -> RequestHeader set Cookie ...
Am I writing a cookie that Plone would recognize as the "Google mini" Plone user? Would you know where is the documentation for the user cookie?
Ah, and btw. maybe you just use ZCatalog and skip using external crawler :-)
Of course I could do that. ;) But we are hosting static html pages served up by Apache separately and I want an integrated search facility with topnotch filters and rendering for PDF and MS Office files.
Marc Schnapp wrote:
Google Mini can do http basic auth, right? If so, you're fine, just put in the basic auth details and define a user in acl_users. Provided the mini presents the credentials without first being challenged by a 401, you'll be fine...
Marc responds: 1) The Google Mini does not accept cookies.
Did I ask if it accepted cookie? No, I asked if it accepts http basic auth. Care to answer my question? ;-)
2) Plone barfs if you try tricks like adding a query string to URLs.
Plohn barfs a lot, probably best not use it ;-)
I don't have to worry about headers being spoofed. The host lives in our dedicated data center behind a VPN concentrator requiring RSA authentication. No one gets to the box unless we already have cleared them through two-phase authentication.
Yah, sure... I'd still worry about headers being spoofed *grinz* cheers, Chris PS: If you want to pay me to solve this, contact me off list... -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Marc Schnapp wrote:
Did I ask if it accepted cookie? No, I asked if it accepts http basic auth. Care to answer my question? ;-) Yes. The Google Mini accepts http basic auth.
Right, so why don't you do what I originally suggested and use that? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
participants (4)
-
Chris Withers -
Marc Schnapp -
Tino Wildenhain -
Tres Seaver