[Zope] finding the content_type of a subclass (phyton newby)

20 Aug 2001

      Hi all,

I made an archive of pdf files searchable through the catalog, but I have a small glich I cannot resolve myself.

The PDFs are stored in the file system using the ExtFile product and I am using pdftotext and ExtDocument to get them indexed by the PrincipiaSearchSource.

The code I use (better to say I borrowed ...) is the following:

In ExtDocument.py
	[...]
	class ExtDocument(ExtFile):
	[...]

	def PrincipiaSearchSource(self):
		"""Convert data to raw text (don't bother formatting)"""
		filename=self._get_filename(self.filename)		

		if self.content_type == 'application/pdf':
			return popen('pdftotext -raw %s -' % filename).read()
		else:
			return 'abracadabra'			

In ExtFile.py
	[...]
	class ExtFile(CatalogAware, SimpleItem, PropertyManager):
	[...]

	def _get_filename
	[...]

The problem is that any new instance of an ExtDocument get indexed in PrincipiaSearchSource as 'abracadabra', meaning that it is not recognized as 'application/pdf'; but when I update the Catalog the ExtDocument get indexed correctly!

Someone has a clue ?

TIA,

--peppo

[Zope] finding the content_type of a subclass (phyton newby)

Giuseppe Bonelli