Search Features and Zope Directions Road Map
Zope is a great application server, the same as its soon to be released Content Management Framework, because of its bet on Python, everybody say it. Nevertheless, after reading the Directions Roadmap from DC, I was surprised that a substantial improvement of the searching features of Zope, wasn't mentioned as a major concern. For a new Zope enthusiast like me, it is a kind of addiction to arrange and administer content while taking the learning curve. Almost everybody in this list with a non-programming background might've experimented this. But when I arrived to the search features of ZCatalog, I got mixed feelings. (Right now I'm stuck on this OR indexes searching :) ) The fact is that - according to my strong belief - everybody uses more Google to look for Zope Site's content than Zope's own Zcatalog's search engine. Moreover, everybody uses more Google to look for everything, bypassing windows, doors, and portals!. Why? Because it's terribly smart (not mentioning its 6,000 Linux boxes, by the way), and because there's no need to follow the highly-engineered information architecture of a web site, if there's a trustful shorcut to the relevant content!. So, if I'd have to mention one big feature improvement to Zope, I wouldn't doubt: "search engine". I just wanted to point on this subject. I know Zope isn't about spidering and retrieving, but it should have "Greater Search Capabilities" as a title, within that roadmap. :) Ausum p.d. Right now I'm quite interested at the technology of searching and finding non structured content, in order to compose structured documents. For example, the guys at Vignette (StoryServer) say that its customers don't need to keyword anything in order to have a "related content" section. After the writer finishes a story, (possibly while) a routine by Autonomy (www.autonomy.com) reads the document and finds out what the document is about, and so it triggers a search for related content within the site, without the need of intervention by the writer. (For the curious, Autonomy has published a personal version of its software. It's called Kenjin (www.kenjin.com) ). On the other hand, Fast, from Norway, already have a nice multimedia search engine, from regular, non-structured, spidered web pages. Can we do that "structuring the unstructured" thing within Zope?
p.d. Right now I'm quite interested at the technology of searching and finding non structured content, in order to compose structured documents. For example, the guys at Vignette (StoryServer) say that its customers don't need to keyword anything in order to have a "related content" section. After the writer finishes a story,
Take a look at http://beta.osdigger.com. It is a mailing list search engine I was working on about a year ago, unfortunately I've not had the time to work on it since. It is designed to scale to doing full text ranked searches on millions of email messages in under a couple of seconds on a single machine (currently PII-300). It has a 'two-step' search feature which brings back related terms to ones you put in (usually :). eg. type in 'scsi controller' and chances are it will return adaptec amongst the list of other terms. The idea is to prompt the user to be more specific with their search. I would love to take more time to work on it again, and would like to be able to access it from withing Zope and use it to catalog arbitary Zope object like ZCatalog does. -Matt -- Matt Hamilton matth@netsight.co.uk Netsight Internet Solutions, Ltd. Business Vision on the Internet http://www.netsight.co.uk +44 (0)117 9090901 Web Hosting | Web Design | Domain Names | Co-location | DB Integration
after reading the Directions Roadmap from DC, I was surprised that a substantial improvement of the searching features of Zope, wasn't mentioned as a major concern.
<snip>
Moreover, everybody uses more Google to look for everything, bypassing windows, doors, and portals!. Why? Because it's terribly smart (not mentioning its 6,000 Linux boxes, by the way), and because there's no need to follow the highly-engineered information architecture of a web site, if there's a trustful shorcut to the relevant content!. So, if I'd have to mention one big feature improvement to Zope, I wouldn't doubt: "search engine". <snip>
On the other hand, Fast, from Norway, already have a nice multimedia search engine, from regular, non-structured, spidered web pages. Can we do that "structuring the unstructured" thing within Zope?
You have posed an important question (and, probably some answers), that hopefully I can clarify. One of the all-important points of the Zope directions document is that our number one goal is to make it wildly easier for _developers_ to create and deploy quality components. Why is this so important? Your questions in this email is why that is so important. You are very interested in high-quality search capabilities, and others certainly are as well. Some other folks care more about E-Commerce, or Corba integration, or communication with Java components. The problem, of course, is that even if DC devoted every single person here to creating the "best search engine" (which we couldn't do for very long - we'd soon be gone), we would still be hard pressed to even come close to making everybody happy or being competitive with every other search engine vendor out there. And the reality is that it is not our goal in life to be a better Google than Google. Multiply that by the number of things people want (ECommerce, Corba, et. al.), and the problem is quite clear - *DC cannot possibly provide the best, most featureful and competitive component for every problem*. The *solution* to this problem is what is outlined in the Zope directions document - dramatically lowering the bar of development to allow a thriving marketplace of robust components (that are *not* written by DC), allows interested parties to write (or better yet, reuse) "the best x component" for their purposes. In the future, Zope may come with "some batteries included", in that a Zope distribution may include the latest versions of the most popular and widely used components. But we hope that the idea of "The ZCatalog" (for instance) will fall by the wayside. Zope may still come with a search component such as ZCatalog that is useful for certain tasks and perhaps as a learningtool, but it will not be an infinitely-scalable infinitely-featureful thing that everyone uses for every problem. The hope is that when you outgrow ZCatalog you can move on to other search components particularly suited to your problem domain. If you scale beyond what ZC can handle, maybe you move up to some VeritySearch component that makes use of existing software. Even now, with the current pain level of component development, building a VeritySearch component would probably take considerably less time than building and maintaining equivalent features into "the ZCatalog". This is the future - the way that Zope will succeed is by being the best framework and component integration platform for the Web, not by trying to compete with verticals like search engine vendors on feature points. "Use the right tool for the job" is something we have always believed in, and providing a platform that will allow you to use and integrate the most appropriate tools will be our focus going forward. That is why "substantial improvement of searching features" is not on the futures roadmap - we do not want to provide the best search engine for every task. We want to make it easy for you to build or integrate the "right" search solution for your task. Brian Lloyd brian@digicool.com Software Engineer 540.371.6909 Digital Creations http://www.digicool.com
Brian, I understand your point. By making Zope a solid framework for starters and believers, every member of the comunity could have the chance to write the components he needs most, contributing in this way to increase Zope's features. Maybe I was wrong at considering the Roadmap as a guide to all of us. It sounded to me like "do you want to help? Ok, try this, we need hands right here", meaning a call to focus on certain core essentials of the framework's evolution, rather than rebuilding certain not-so-important components again and again. I talked about Google and Zcatalog, not to compare them one to one, but to mention how important the searching itself is as a Zope's core function, and how important it is to improve it , precisely to allow future developers to build over it and to deal with the kind of features I made comments about: Structuring unstructured documents, finding related content, etc. I also understand the "suitable component" strategy, when you mentioned the case of a Verity Search component to work with Zope, as a probable future non-DC development, and I would add Ultraseek as a natural pick up (provided it's entirely written in Python). I've played around with this last, and I will say that no matter how hard you look for it, you won't find the feature "please don't show me the meta description nor the first lines of the document as a summary" :) You won't. And if you ask to an Inktomi representant he will tell you, "no sir, our software won't show you the queried words at the result's summaries (like Webinator or Google) because it is too memory consuming" ! :) That's why I turn my eyes to Zope and have mixed feelings about ZCatalog. It's so great that it's possible to use it at everywhere a logic sentence can be. But it's frustrating when, ie, no matter how many homeworks you did at learning dtml, you just can't alter the iteration (like eliminating repeated records from an OR search) without a sort of patch. And according to your policy, to make "wildly easier" to developers to write new components (like the ones in my wish list), this last is an example of where it's possible "substancial improvements". :) Thanks for your kind atention, Ausum p.d: I'm sorry for the original cross post. I didn't notice it was forbidden. ----- Mensaje original ----- De: "Brian Lloyd" <brian@digicool.com> Para: "Ausum" <augusto@artlover.com>; <zope@zope.org>; <zope-dev@zope.org> CC: <brian@digicool.com> Enviado: Martes, 27 de Febrero de 2001 10:55 a.m.
You have posed an important question (and, probably some answers), that hopefully I can clarify.
One of the all-important points of the Zope directions document is that our number one goal is to make it wildly easier for _developers_ to create and deploy quality components. Why is this so important? Your questions in this email is why that is so important.
You are very interested in high-quality search capabilities, and others certainly are as well. Some other folks care more about E-Commerce, or Corba integration, or communication with Java components.
The problem, of course, is that even if DC devoted every single person here to creating the "best search engine" (which we couldn't do for very long - we'd soon be gone), we would still be hard pressed to even come close to making everybody happy or being competitive with every other search engine vendor out there. And the reality is that it is not our goal in life to be a better Google than Google.
Multiply that by the number of things people want (ECommerce, Corba, et. al.), and the problem is quite clear - *DC cannot possibly provide the best, most featureful and competitive component for every problem*.
The *solution* to this problem is what is outlined in the Zope directions document - dramatically lowering the bar of development to allow a thriving marketplace of robust components (that are *not* written by DC), allows interested parties to write (or better yet, reuse) "the best x component" for their purposes.
In the future, Zope may come with "some batteries included", in that a Zope distribution may include the latest versions of the most popular and widely used components. But we hope that the idea of "The ZCatalog" (for instance) will fall by the wayside. Zope may still come with a search component such as ZCatalog that is useful for certain tasks and perhaps as a learningtool, but it will not be an infinitely-scalable infinitely-featureful thing that everyone uses for every problem.
The hope is that when you outgrow ZCatalog you can move on to other search components particularly suited to your problem domain. If you scale beyond what ZC can handle, maybe you move up to some VeritySearch component that makes use of existing software. Even now, with the current pain level of component development, building a VeritySearch component would probably take considerably less time than building and maintaining equivalent features into "the ZCatalog".
This is the future - the way that Zope will succeed is by being the best framework and component integration platform for the Web, not by trying to compete with verticals like search engine vendors on feature points. "Use the right tool for the job" is something we have always believed in, and providing a platform that will allow you to use and integrate the most appropriate tools will be our focus going forward. That is why "substantial improvement of searching features" is not on the futures roadmap - we do not want to provide the best search engine for every task. We want to make it easy for you to build or integrate the "right" search solution for your task.
Brian Lloyd brian@digicool.com Software Engineer 540.371.6909 Digital Creations http://www.digicool.com
Please do NOT cross post. -Michel On Mon, 26 Feb 2001, Ausum wrote:
Zope is a great application server, the same as its soon to be released Content Management Framework, because of its bet on Python, everybody say it. Nevertheless, after reading the Directions Roadmap from DC, I was surprised that a substantial improvement of the searching features of Zope, wasn't mentioned as a major concern.
For a new Zope enthusiast like me, it is a kind of addiction to arrange and administer content while taking the learning curve. Almost everybody in this list with a non-programming background might've experimented this. But when I arrived to the search features of ZCatalog, I got mixed feelings. (Right now I'm stuck on this OR indexes searching :) )
The fact is that - according to my strong belief - everybody uses more Google to look for Zope Site's content than Zope's own Zcatalog's search engine. Moreover, everybody uses more Google to look for everything, bypassing windows, doors, and portals!. Why? Because it's terribly smart (not mentioning its 6,000 Linux boxes, by the way), and because there's no need to follow the highly-engineered information architecture of a web site, if there's a trustful shorcut to the relevant content!. So, if I'd have to mention one big feature improvement to Zope, I wouldn't doubt: "search engine".
I just wanted to point on this subject. I know Zope isn't about spidering and retrieving, but it should have "Greater Search Capabilities" as a title, within that roadmap. :)
Ausum
p.d. Right now I'm quite interested at the technology of searching and finding non structured content, in order to compose structured documents. For example, the guys at Vignette (StoryServer) say that its customers don't need to keyword anything in order to have a "related content" section. After the writer finishes a story, (possibly while) a routine by Autonomy (www.autonomy.com) reads the document and finds out what the document is about, and so it triggers a search for related content within the site, without the need of intervention by the writer. (For the curious, Autonomy has published a personal version of its software. It's called Kenjin (www.kenjin.com) ). On the other hand, Fast, from Norway, already have a nice multimedia search engine, from regular, non-structured, spidered web pages. Can we do that "structuring the unstructured" thing within Zope?
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
participants (4)
-
Ausum -
Brian Lloyd -
Matt Hamilton -
Michel Pelletier