Fwd: [Zope] zope on google file system
On Wed, Mar 26, 2008 at 11:08 AM, Chris Withers <chris@simplistix.co.uk> wrote:
Tim Nash wrote:
I don't have the skills but I think it would be cool if some student ported Zope to utilize features of the google file system or libferris. Libferris is a virtual file system that mounts just about everything including postgres, xml and OpenOffice docs.
Where are you thinking of plugging Zope into this storage layer?
Something like this: Apache -> zope (business logic, security) -> libferris along with ZODB Perhaps making all libferris resources url addressable within zope. However Postgres and other databases would be better off getting accessed through zsql. Perhaps also make all ZODB objects published as libferris resources.
If zope ran on the gfs (primarily adding business logic, security and publishing) it would give a boost to the value of any zope based company.
I don't know what "running on gfs" would mean, but surely this is only available to Google employees and internal projects?
The google file system is a distributed file system. http://labs.google.com/papers/gfs.html If the zope <--> distributed file system interface was clean, zope could run against other distributed file systems. Then when the gfs becomes a service that can be purchased (surely coming one day) zope is ready to be ported to gfs. Also, google can only offer so much without the ability to utilize business logic and security. They must be looking for an affordable way to offer business logic that can scale and serving secure objects. A bright student can help them explore using zope for this.
Also, I think it would be fun to run map/reduce on my stored objects!
Dunno what this means either...
Here is a paper on map reduce. If zope is storing documents, it would be useful to map / reduce the documents stored in zope and prepare reports, etc.. http://labs.google.com/papers/mapreduce.html Here is an overview offering many competing arguments: "Why Should You Care About MapReduce?" http://www.theserverside.com/news/thread.tss?thread_id=48283 Also, check out hadoop. http://hadoop.apache.org/
Chris
-- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
+-------[ Tim Nash ]---------------------- | On Wed, Mar 26, 2008 at 11:08 AM, Chris Withers <chris@simplistix.co.uk> wrote: | > Tim Nash wrote: | > > I don't have the skills but I think it would be cool if some | > > student ported Zope to utilize features of the google file system or | > > libferris. Libferris is a virtual file system that mounts just about | > > everything including postgres, xml and OpenOffice docs. | > | > Where are you thinking of plugging Zope into this storage layer? | > | Something like this: | Apache -> zope (business logic, security) -> libferris along with ZODB | | Perhaps making all libferris resources url addressable within zope. | However Postgres and other databases would be better off getting | accessed through zsql. | | Perhaps also make all ZODB objects published as libferris resources. Perhaps I can point you here; http://sourceforge.net/projects/localfs/ or one of a dozen other FS<->Zope mapping products... or ORM products or ... There's even code out there that maps .zip files as folders... -- Andrew Milton akm@theinternet.com.au
Does localfs work with virtual file systems? Is there a zope mapping product that maps zope to a distributed file system? What is the best way to run map/reduce on xml files that are stored in the zodb? Thanks, Tim On 3/26/08, Andrew Milton <akm@theinternet.com.au> wrote:
+-------[ Tim Nash ]---------------------- | On Wed, Mar 26, 2008 at 11:08 AM, Chris Withers <chris@simplistix.co.uk> wrote: | > Tim Nash wrote: | > > I don't have the skills but I think it would be cool if some | > > student ported Zope to utilize features of the google file system or | > > libferris. Libferris is a virtual file system that mounts just about | > > everything including postgres, xml and OpenOffice docs. | > | > Where are you thinking of plugging Zope into this storage layer? | > | Something like this: | Apache -> zope (business logic, security) -> libferris along with ZODB | | Perhaps making all libferris resources url addressable within zope. | However Postgres and other databases would be better off getting | accessed through zsql. | | Perhaps also make all ZODB objects published as libferris resources.
Perhaps I can point you here; http://sourceforge.net/projects/localfs/
or one of a dozen other FS<->Zope mapping products... or ORM products or ...
There's even code out there that maps .zip files as folders...
-- Andrew Milton akm@theinternet.com.au
+-------[ Tim Nash ]---------------------- | Does localfs work with virtual file systems? If it can be "mounted" and looks like a file system and smells like a file system, then localfs or in fact anything else, should know any different. | Is there a zope mapping product that maps zope to a distributed file system? You don't really explain in what way you want it distributed. Zope is an application server, so what you're asking for doesn't make any sense. You can certainly "distribute" your ZODB across as many file systems as you want right now. You can certainly just plonk your Data.fs ZODB on any filesystem you want distributed or otherwise. If you want a "smarter" ZODB or a different STORAGE layer that's a different kettle of fish, but, also NOT what you previously asked for. | What is the best way to run map/reduce on xml files that are stored in the zodb? The same way you run map/reduce on xml files that are stored anywhere, although one could contend that having XML files in a ZODB might be at least one too many levels of abstraction. -- Andrew Milton akm@theinternet.com.au
What I am looking for is a way to store my data in xml using zope and run map/reduce (or something very much like it) on live data. 1. Should I try to see if localFS will read/write to xml files on the hadoop filesystem http://hadoop.apache.org/core/docs/current/hdfs_design.html 2. or should I look for python equivalents to hadoop? 3. or should I just use java for this area of my application? Which approach (or something else) would you take? Anybody? On 3/26/08, Andrew Milton <akm@theinternet.com.au> wrote:
+-------[ Tim Nash ]---------------------- | Does localfs work with virtual file systems?
If it can be "mounted" and looks like a file system and smells like a file system, then localfs or in fact anything else, should know any different.
| Is there a zope mapping product that maps zope to a distributed file system?
You don't really explain in what way you want it distributed. Zope is an application server, so what you're asking for doesn't make any sense.
You can certainly "distribute" your ZODB across as many file systems as you want right now. You can certainly just plonk your Data.fs ZODB on any filesystem you want distributed or otherwise.
If you want a "smarter" ZODB or a different STORAGE layer that's a different kettle of fish, but, also NOT what you previously asked for.
| What is the best way to run map/reduce on xml files that are stored in the zodb?
The same way you run map/reduce on xml files that are stored anywhere, although one could contend that having XML files in a ZODB might be at least one too many levels of abstraction.
-- Andrew Milton akm@theinternet.com.au
+-------[ Tim Nash ]---------------------- | What I am looking for is a way to store my data in xml using zope and | run map/reduce (or something very much like it) on live data. Write a Zope Product and store your XML anywhere you want, FS, ZODB, FTP site somewhere... However if you really have enough data to warrant any of this, you're not going to be running map/reduce from inside Zope. You'll be calling out to something else to do map/reduce and return you the results. | 1. Should I try to see if localFS will read/write to xml files on the | hadoop filesystem | http://hadoop.apache.org/core/docs/current/hdfs_design.html This is not (yet) a filesystem, it's a database written in Java with 'filesystem-like' commands. | 2. or should I look for python equivalents to hadoop? | | 3. or should I just use java for this area of my application? | | Which approach (or something else) would you take? I would start with a list of requirements... -- Andrew Milton akm@theinternet.com.au
I would start with a list of requirements...
The requirements are to run distributed map/reduce on 'live' xml data that is stored by the zope application server.
You'll be calling out to something else to do map/reduce and return you the results.
Agreed, but what is the storage mechanism for the files used in that process? If it is the hadoop file system then you can't use live data, you would have to copy the files to the hadoop file sytem, correct? It still looks to me like a zope to virtual file system mapping would be useful. Unfortunately it also looks like I am the only one who wants it so I'm not going to post it to the gsoc mailing list.
+-------[ Tim Nash ]---------------------- | > | > I would start with a list of requirements... | > | | The requirements are to run distributed map/reduce on 'live' xml data | that is stored by the zope application server. And I want a Ferrari. That's about as much of a requirement as those. | >You'll be calling | >out to something else to do map/reduce and return you the results. | | Agreed, but what is the storage mechanism for the files used in that process? | If it is the hadoop file system then you can't use live data, you | would have to copy the files to the hadoop file sytem, correct? Well if you're ONLY storing them in hadoop via some mechanism then no. And if your data is large enough to warrant using hadoop you're never going to store them in Zope. | It still looks to me like a zope to virtual file system mapping would | be useful. Procfs is a virtual filesystem, devfs is a virtual filesystem. smb and nfs mounts are virtual filesystems that shadow actual filesystems, these would work out of the box with LocalFS. Until you can mount Hadoop in someway, it is not a filesystem, it's just an application with with an API. | Unfortunately it also looks like I am the only one who | wants it so I'm not going to post it to the gsoc mailing list. If you want a python library to interact with hadoop, write one, it's not hard to turn java into python. Then write a product that consists of; Top level object that acts as a container that talks to hadoop (contains all the logic to create files/directories etc). Sub-objects that represent directories inside hadoop (as a Folder inside Zope) Sub-objects that represent (XML) files inside hadoop Add a methods and ZPTs that perform your operations and display results. Then you can just navigate through your XML data. The "hard" part will be talking to hadoop from python; Although see here; http://www.stat.purdue.edu/~sguha/code.html#hadoopy Although it would probably a lot easier to use ctypes on the c lib and making a nicer interface using that. Once you can turn a URL (http://hadoop.example.com/tnash/xml/xml_001) into a hadoop "URI" (hadoop xml/001) you're pretty much done; You can use "popen" to run your map/reduce command from inside your "object" and to fetch the results to display inside Zope (probably fairly inefficient, but, it'd work). Or just get the job number and scrape the webserver... Oh but you wanted to store the files IN zope... so you can ignore all that. -- Andrew Milton akm@theinternet.com.au
And if your data is large enough to warrant using hadoop you're never going to store them in Zope.
If you cache the GUI using javascript, keep the business layer thin and off-load the majority of the indexing, why not?
Procfs is a virtual filesystem, devfs is a virtual filesystem. smb OK, hold on while I write a distributed map/reduce system that runs on devfs.. :)
Thanks for this link (really). I hope this library develops more. It looks interesting. I was only thinking along these lines: http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Pyth...
Although it would probably a lot easier to use ctypes on the c lib and making a nicer interface using that.
Please explain. Would your idea work better with localfs?
You can use "popen" to run your map/reduce command from inside your "object" and to fetch the results to display inside Zope (probably fairly inefficient, but, it'd work). In my case, I think many search requests can be pre-indexed into python so it would be only a few users that would suffer.
Oh but you wanted to store the files IN zope... so you can ignore all that.
I'd just as well would rather ignore the sarcasm. At least you are willing to think about this! -Tim
+-------[ Tim Nash ]---------------------- | > And if your data is large enough to warrant using hadoop you're never | > going to store them in Zope. | | If you cache the GUI using javascript, keep the business layer thin | and off-load the majority of the indexing, why not? Because the minimum cluster size for hadoop is 64Mb, which means you really want each object you store to be at least 64Mb in size (or close to it). Files of this size are not something Zope is good at serving up out of the ZODB. | > Procfs is a virtual filesystem, devfs is a virtual filesystem. smb | OK, hold on while I write a distributed map/reduce system that runs on devfs.. | :) They ARE working on exposing it via DAV... so there's hope for you yet... | > | > http://www.stat.purdue.edu/~sguha/code.html#hadoopy | > | Thanks for this link (really). I hope this library develops more. It | looks interesting. I was only thinking along these lines: | http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Pyth... | | > Although it would probably a lot easier to use ctypes on the c lib and | > making a nicer interface using that. | > | Please explain. Would your idea work better with localfs? No, it just wouldn't be as ugly to assemble as trying to use swig and hand-patching Makefiles, and you can make a pythonic layer around it, that you can place logic into. But hey, you have SOMETHING to get started with. -- Andrew Milton akm@theinternet.com.au
participants (2)
-
Andrew Milton -
Tim Nash