Hei All. I have the following problem: I am building up a ZCatalog and indexing my DTML methods. I use the index type ZCTextIndex and the object function PrincipiaSearchSource. It works fine. But when I try to index my Files (type File) with index type ZCTextIndex and the object function SearchableText it finds no words and the index is empty. Am I using the wrong object function? Thanks, Sune
On 21 Jan 2006, at 13:02, Sune Christiansen wrote:
Hei All.
I have the following problem: I am building up a ZCatalog and indexing my DTML methods. I use the index type ZCTextIndex and the object function PrincipiaSearchSource. It works fine. But when I try to index my Files (type File) with index type ZCTextIndex and the object function SearchableText it finds no words and the index is empty. Am I using the wrong object function?
Zope File objects do not support indexing their textual content. You will need to implement your own text retrieval or use some of the other indices out there like Andreas Jung's TextIndexNG which come with suitable modules that can pull text out of various file formats. jens
Jens Vagelpohl schrieb:
On 21 Jan 2006, at 13:02, Sune Christiansen wrote:
Hei All.
I have the following problem: I am building up a ZCatalog and indexing my DTML methods. I use the index type ZCTextIndex and the object function PrincipiaSearchSource. It works fine. But when I try to index my Files (type File) with index type ZCTextIndex and the object function SearchableText it finds no words and the index is empty. Am I using the wrong object function?
Zope File objects do not support indexing their textual content. You will need to implement your own text retrieval or use some of the other indices out there like Andreas Jung's TextIndexNG which come with suitable modules that can pull text out of various file formats.
Newer Zopes have file-objects indexable via PrincipiaSearchSource if their content-type is text/* OFS/Image.py, 423ff: def PrincipiaSearchSource(self): """ Allow file objects to be searched. """ if self.content_type.startswith('text/'): return str(self.data) return '' HTH tino
Hei again. I have installed TextIndexNG and indexed my Zope DTML Methods objects and Zope Files objects, and enabled "Document converters (PDF, Word etc.)" As indexed attributes I use SearchableText,PrincipiaSearchSource,getFile, but the indexes related to the pdf files are still empty. Is it correct to upload my pdf document as a Zope File object? Thanks, Sune
On 21 Jan 2006, at 13:02, Sune Christiansen wrote:
Hei All.
I have the following problem: I am building up a ZCatalog and indexing my DTML methods. I use the index type ZCTextIndex and the object function PrincipiaSearchSource. It works fine. But when I try to index my Files (type File) with index type ZCTextIndex and the object function SearchableText it finds no words and the index is empty. Am I using the wrong object function?
Zope File objects do not support indexing their textual content. You will need to implement your own text retrieval or use some of the other indices out there like Andreas Jung's TextIndexNG which come with suitable modules that can pull text out of various file formats.
jens
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
--On 24. Januar 2006 16:58:52 +0100 Sune Christiansen <sune@binf.ku.dk> wrote:
Hei again.
I have installed TextIndexNG and indexed my Zope DTML Methods objects and Zope Files objects, and enabled "Document converters (PDF, Word etc.)" As indexed attributes I use SearchableText,PrincipiaSearchSource,getFile, but the indexes related to the pdf files are still empty. Is it correct to upload my pdf document as a Zope File object?
Is your external PDF converter installed _properly_? -aj
when you say external PDF converter, do you mean the pdf converter I created the pdf file with? I have tried to index a microsoft word file also, but the result is the same: an empty index. - Sune
--On 24. Januar 2006 16:58:52 +0100 Sune Christiansen <sune@binf.ku.dk> wrote:
Hei again.
I have installed TextIndexNG and indexed my Zope DTML Methods objects and Zope Files objects, and enabled "Document converters (PDF, Word etc.)" As indexed attributes I use SearchableText,PrincipiaSearchSource,getFile, but the indexes related to the pdf files are still empty. Is it correct to upload my pdf document as a Zope File object?
Is your external PDF converter installed _properly_?
-aj
Sune Christiansen wrote at 2006-1-24 18:56 +0100:
when you say external PDF converter, do you mean the pdf converter I created the pdf file with? I have tried to index a microsoft word file also, but the result is the same: an empty index.
You need converters from the media format (i.e. PDF, MS-Word, ...) to text (or maybe better named: text extraction utilities). The standard PDF converter is "XPDF" (which contains "pdftotext" (or similarly)). The standard Word converter is "wvware". -- Dieter
participants (5)
-
Andreas Jung -
Dieter Maurer -
Jens Vagelpohl -
Sune Christiansen -
Tino Wildenhain