ZODB and ZCatalog related questions
Hi... A few little questions about ZODB and ZCatalog... 1. I have a new Zope database (= 2.5 Mo). After adding a single PDF file of 3.5 Mo, the database is nearly 10 Mo huge !! Is required space into ZODB "reserved" by big blocks, or may the ZODB's size increase twice as the size of files which are stored into it ?? 2. After adding a few PDF files of nearly 3 Mo, stored as "ExternalFiles" (which means that only metadatas are stored into ZODB), and indexing them into a ZCatalog with TextIndexNG (my ExternalFiles provide a PrincipiaSearchSource method with the help of "pdftotext"), the size of the ZODB is 19 Mo. Does it means that full text indexing is using as much space as the document itself (but I'm starting with an empty ZCatalog) ?? 3. I'd like my products instances to be indexed automatically, as soon as they are created or modified. But as indexing seems to be quite long with big files, is it possible to index these documents in the background, so that the user does not have to wait until indexation is finished... 4. Does anyone have any experience with ZODB files bigger than 2 Giga-bytes on Linux. If so, how long does it take to pack ?? Thanks, Thierry -- Linux every day, keeps Dr Watson away... http://gpc.tuxfamily.org -- http://www.ulthar.net
Thierry Florac writes:
... 1. I have a new Zope database (= 2.5 Mo). After adding a single PDF file of 3.5 Mo, the database is nearly 10 Mo huge !! Is required space into ZODB "reserved" by big blocks, or may the ZODB's size increase twice as the size of files which are stored into it ?? Objects are stored as pickles. There are larger (usually slightly larger) than the original object.
I am astonished that you see your database grow by twice the amount one would expect. Try whether packing reduces the size. Maybe, the object is accidentally modified. It may well be that the object is rewritten, when you modify properties in a separate request. I hope not, but there might be a chance.
2. .... Does it means that full text indexing is using as much space as the document itself (but I'm starting with an empty ZCatalog) ?? Together with each object, the index maintains the set of all its words. This is done in order to be able to unindex the object without the need to touch every indexed term (but only those contained in the object). As a consequence, each object takes almost as much space in the index as it takes itself (at least for textindexes with many different words).
3. I'd like my products instances to be indexed automatically, as soon as they are created or modified. But as indexing seems to be quite long with big files, is it possible to index these documents in the background, so that the user does not have to wait until indexation is finished... Shane et. al. has some idea in this direction:
Perform indexing in a background task. Indexing requests are put onto a work queue. When Zope is lazy, a background task fetches requests from the work queue and executes them. You need be a bit careful. Read the ZODB description (--> Zope.org) to understand the interaction between ZODB connections, transactions and threads. Dieter
3. I'd like my products instances to be indexed automatically, as soon as they are created or modified. But as indexing seems to be quite long with big files, is it possible to index these documents in the background, so that the user does not have to wait until indexation is finished... Shane et. al. has some idea in this direction:
..and if you have the luxury of more than one box, connect another box with ZEO and do the indexing there. -- Andy McKay
On Fri, 2002-06-28 at 22:43, Dieter Maurer wrote:
Thierry Florac writes:
... 1. I have a new Zope database (= 2.5 Mo). After adding a single PDF file of 3.5 Mo, the database is nearly 10 Mo huge !! Is required space into ZODB "reserved" by big blocks, or may the ZODB's size increase twice as the size of files which are stored into it ?? Objects are stored as pickles. There are larger (usually slightly larger) than the original object.
What are "pickles" ?? What do you mean when you say "slightly larger" ???
I am astonished that you see your database grow by twice the amount one would expect. Try whether packing reduces the size. Maybe, the object is accidentally modified.
I've done a really simple test, adding the PDF file as a "File" object from the ZMI. No property was modified... But you you mean also that when a single little property like "title" is modified, it's the whole object which is replicated into the ZODB ???
2. .... Does it means that full text indexing is using as much space as the document itself (but I'm starting with an empty ZCatalog) ?? Together with each object, the index maintains the set of all its words. This is done in order to be able to unindex the object without the need to touch every indexed term (but only those contained in the object). As a consequence, each object takes almost as much space in the index as it takes itself (at least for textindexes with many different words).
3. I'd like my products instances to be indexed automatically, as soon as they are created or modified. But as indexing seems to be quite long with big files, is it possible to index these documents in the background, so that the user does not have to wait until indexation is finished... Shane et. al. has some idea in this direction:
Perform indexing in a background task.
Indexing requests are put onto a work queue. When Zope is lazy, a background task fetches requests from the work queue and executes them.
You need be a bit careful. Read the ZODB description (--> Zope.org) to understand the interaction between ZODB connections, transactions and threads.
Thierry
Thierry Florac writes:
On Fri, 2002-06-28 at 22:43, Dieter Maurer wrote:
Objects are stored as pickles. There are larger (usually slightly larger) than the original object.
What are "pickles" ?? That's Python's name for serialized object. What do you mean when you say "slightly larger" ??? Larger but not by a factor of 2.
I've done a really simple test, adding the PDF file as a "File" object from the ZMI. No property was modified... But you you mean also that when a single little property like "title" is modified, it's the whole object which is replicated into the ZODB ??? If the component is not itself persistent, yes.
The File content should be persistent (if larger than 16 kB), but it did not verify that hope. Dieter
participants (3)
-
Andy McKay -
Dieter Maurer -
Thierry Florac