Under Zope it is definitely possible, under Plone I don't know as I don't have direct experience with it. I definitely did this (and much more ... ;-)) from Zope. Your mileage may vary, but this is the route I have followed with _very_ satisfactory results. 1. use localFs or alike to map yur xml doc base from the filesystem to ZooDB 2. install the standard xml python plumbing, i.e. pyXML and/or 4Suite 3. install the zope xml plumbing of your choiche. I use the zopexmlmethods product. This gives you an easy and very reliable way to perform XSLT (and much more). 4. depending on the structure of your xml files you may find useful to write an import routine which split the xml files in chunks and create a structure of custom zope classes; this is not really necessary, but I think is a best practice as is performance-friendly; you will ned at least one container class derived from Folder and a content class derived from SimpleItem and both need to be catalog aware. This will definitely helps you in indexing and bulding a navigation path through the html produced by the XSLT. Obviously having one or more dtd describing your xml content would be very advisable. 5. for indexing the xml content you need some xml stripping code which extracts the content as unicode and feed the textIndexes you need (I use the TextIndexNG products instead of the standard Zope textIndex). All this gives you a tremendous amount of flexibility and a very scalable infrastructure. Lessons I learned developing all this: - use xslt only when really necessary; i.e only for HTML (or other formats) rendering. - import the xml into ZooDB using custom classes - when importing, transform some high level structure present in the xml content to python properties (for example chapter titles, section headings, ecc.). - remember that dtml/tal is a faster templating system than plain xslt as xml parsing has a significant performance overhead. This, actually, is the old "separate logic from presentation" mantra:if you need to apply logic to your content, parse once from xml to native python structures and use python methods to do whatever you need. On this respect you my find useful two remarkable python modules: elementtree and pyXRP: both gives you an easy path from xml to native python structures. Remember also that is very easy (and fun) to create xml streams from python lists/tuples. - pay _extreme_ attention to unicode related issues: this means transforming from xml strings to unicode types as soon as you read the xml content into python - use Zcatalog as much as you can (but this should be standard Zope practice). - put everything behind apache and you will have a wonderfull three level chaching system: level0=xslt chaching made by zopexmlmethod, level1=zope standard chace system, level2=apache - use _always_ absolute urls !!! All this seems complicated , but in reality it isn't, thanks to the standard services python/Zope gives you and to the remarkable products developed by the bright folks on this list!!! Hopes this helps, __peppo
-----Original Message----- From: zope-bounces@zope.org [mailto:zope-bounces@zope.org]On Behalf Of FNk Sent: giovedi 11 settembre 2003 10.24 To: Zope Subject: [Zope] Indexing and Searching through XML files
Hi,
I got lots of XML-files(about 100.000) in a directory tree on my file system. I want to publish those files using Zope/Plone.
I need to be able to index them in their native format without having to upload them in the ZODB, let users search their contents through zope, and have the result displayed.
Is it possible? Did anybody ever do this? Any suggestions?
I'm running zope-2.6.2 and Plone-1.1.
Thanks,
Fab.
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )